library(tidyverse)
<- tribble(
education.dat ~ Degree, ~ Year, ~ Institution, ~ Where,
"Doctor of Philosophy, Wildlife and Fisheries Sciences", "2018", "South Dakota State University", "Brookings, SD",
"Master of Science, Wildlife and Fisheries Sciences", "2013", "South Dakota State University", "Brookings, SD",
"Bachelor of Science, Biology", "2010", "University of Wisconsin-Oshkosh", "Oshkosh, WI"
)
Applying tidy data Principles to Curriculum vitae
Background
I’m writing this blog post two weeks after being illegally fired as part of the nationwide termination of US federal probationary employees. I had been in my position for ten months, and the furthest thing from my mind was keeping my CV up-to-date. My most recent CV consisted of my application materials for the job from which I was just fired. That version was made in Microsoft Word and was tailored to the language and requirements of that specific job posting. Some of the pieces of information within the CV were adjusted to match keywords in the job posting, and others were included because they were specified in the job application guidance.
Now that I had a little bit of time to update my materials, I started thinking about how I could make it easier on myself in the future. Over the last few years I’ve moved essentially all of my workflows into R, and I thought there must be something there to help with this problem. I quickly found the excellent vitae
package (O’Hara-Wild and Hyndman 2024) which provides easy-to-understand functions for constructing elements of a CV, as well as some sharp-looking templates for rendering to a PDF document.
Pretty much as soon as I discovered vitae
I started converting my existing, bland CV from Microsoft Word document to RMarkdown.
What I started to notice is that the data I wanted to include in my CV didn’t actually live anywhere other than previous versions, which was Word and PDF documents in my case. The examples I looked at from vitae
vignettes stored much of this information within the markdown file that is then eventually rendered to a PDF. Something like the following for storing records of education.
This constructs a data frame that is structured like this:
Degree | Year | Institution | Where |
---|---|---|---|
Doctor of Philosophy, Wildlife and Fisheries Sciences | 2018 | South Dakota State University | Brookings, SD |
Master of Science, Wildlife and Fisheries Sciences | 2013 | South Dakota State University | Brookings, SD |
Bachelor of Science, Biology | 2010 | University of Wisconsin-Oshkosh | Oshkosh, WI |
This format is tailored to feed into vitae
functions, specifically the vitae::brief_entries()
and vitae::detailed_entries()
functions. I found these functions and their arguments to be a very good way to conceptualize the elements of a CV, but storing data specifically in this format means that the underlying information was summarized and not tidy (Wickham 2014).
If you’re not familiar with the concept of tidy data, the basic principle is that when you’re organizing tabular data every row should represent an observation, every column should represent a variable, and each type of observational unit should have its own table (Wickham 2014). The table of education shown above shows a good example of untidy data. The “Degree” column has multiple variables which would be more correctly identified as degree (Doctor of Philosophy) and major (Wildlife and Fisheries Sciences). The benefits of organizing data according to these principles are difficult to overstate, so I set out to organize my CV data in a tidy format.
Storing CV Data
I ended up storing the information that I used in my CV in two formats. Most could be easily organized and stored in a spreadsheet, whereas items like peer-reviewed journal articles are a special case which I handled separately with a citation manager.
Spreadsheet Data
I used a Microsoft Excel version to share this example of my spreadsheet but there’s nothing about what I did that couldn’t be accomplished in any spreadsheet software. My spreadsheet can be found in the GitHub repository of this blog post and is titled “CV_Example_Workbook.xlsx”. In this post I will show some examples of the types of data that I pulled together for my CV, but this template can definitely be expanded beyond the example shown here. My advice in building further tables in the spreadsheet is to adhere to tidy principles (Wickham 2014), and best practices for data organization in spreadsheets (Broman and Woo 2018).
One other thing I did in setting up my spreadsheet was to use related tables to reduce redundancy. An example of this would be when listing specific tasks/accomplishments associated with a job in my work history. Rather than copying all of the relevant columns for the job (i.e. title, company, company address, supervisor, etc.) with each task, I used a related table that just has the columns needed to identify the specific job, and then a column for tasks. You can see an example of this strategy in the example workbook that has a “work” sheet for jobs themselves, and a related “work_details” sheet for listing specific tasks within those jobs.
One last note I’ll add regarding the spreadsheet data is I have found it useful to document the most specific form of data possible for any given type, even if I never intend to use that degree of specificity in my CV. A good example is recording the specific date when a degree was conferred. Although it’s not conventional to include this level of specificity on a CV, there are times when you may need it, particularly for actual job applications. If you have the specific date it is trivial to convert that to a more conventionally used format, such as Month-Year, but it can be unnecessarily painful to dig through records and find the actual date your degree was conferred.
Citations
Anything that might be cited - e.g. peer-reviewed publications, technical reports, presentations, software - has established guidelines and associated software to conform to those guidelines. I definitely recommend tracking personal records in a citation manager. I’ve found Zotero to be exceptional (and free!), but there are other options available. My recommendation is to store any work that can be used as a citation in a citation manager, and attach a full-text copy if at all possible. This advice applies outside of publications - for example, if you give a presentation you can print your slides to a PDF and attach to the citation entry.
I usually import citation metadata from DOI entries whenever available. In Zotero, I typically do this by pasting the DOI address into Zotero, and it will automatically populate the relevant fields. One thing I’ve found, however, is that text formatting of citation metadata isn’t always exactly correct in these entries. A specific issue I found in the process of building my CV was that species names weren’t displayed in italics. The example in my CV is under the DOI address https://doi.org/10.1007/s10592-016-0820-y. You could download the citation metadata from this paper by clicking “Cite this article” and then “Download citation”, or you could just paste the DOI address (10.1007/s10592-016-0820-y) into Zotero, but either way you’ll end up with the species name for Mountain Sucker (Pantosteus jordani) coming through in standard formatting, not in italics. The way I found to override this was to use LaTeX markup formatting in the Title field within Zotero. To specify italics in this example I changed that part of the title to: \texit{Pantosteus jordani}
. Another thing to be aware of is that however the words in titles are capitalized in your Zotero library is how they will come through when exporting.
One other “trick” I used in Zotero was to include a journal article that is currently in review. Journal articles don’t have a DOI until they’re published, so I manually entered the fields that I did know which included title, authors, and publication. I also added “In Review” in the “Volume” field so I could handle this paper separately in my CV construction.
The process of getting citation information from the citation manager software to RMarkdown is to export a BibTex bibliography file to the project directory for your CV RMarkdown file. From Zotero, this is accomplished by highlighting the entries you want to export, then right-clicking and selecting “Export Items”; the default output format from Zotero is BibTex, so just make sure to be saving in your project directory. I named that file “publications.bib”.
One thing I’ll point out here is that I ended up formatting presentations as “Conference Paper” in Zotero. When entries are exported from Zotero to a BibTex file they are classified according to standard entry types of BibTex. There are 14 types, and by choosing “Conference Paper” in Zotero they will be assigned to the “inproceedings” entry type. This matters because citations will be formatted according to something called a CSL file which sets explicit formatting rules depending on the entry type. There is a “Presentation” Item Type in Zotero, but these will all be assigned a “misc” entry type when exported to BibTex. You can specify formatting for “misc” entries but I thought it would be better to be specific where I could be.
Building the CV in R
Initial Steps
To generate PDF documents from RMarkdown you will need to setup something called LaTeX on your local machine. You can do so using the tinytex
package (Xie 2025) with the following code.
::install_tinytex() tinytex
You will only have to do this once. Now, to use templates from vitae
make sure to install that package as well. You can do so from the RStudio menu by clicking Tools -> Install Packages and searching for vitae
, or simply run the following code.
install.packages("vitae")
Starting from template
To start, I opened a new RMarkdown file and on the left of the dialog box you will have an option to select “From Template”. This will show templates that are available from the vitae
package. To view examples of these templates check out the vitae
README, and if you’re interested in learning how to make your own custom template see this vignette. For my purposes, I thought the “awesomecv” template looked good, so I selected that option, which showed up as “Curriculum Vitae (Awesome-CV format)”. In the dialog box, it indicates that the template contains multiple files, so I needed to create a new directory for the files. I named mine “CV_Example”. Now that I was done with the dialog box I clicked OK and it took me to an RMarkdown file that is populated with the Marie Curie example used in all of the vitae
templates.
Customizing bibliography formatting
Citations in any RMarkdown document are formatted according to a CSL (Citation Style Language) file. When using a vitae
template the American Psychological Association (APA) formatting rules are used by default. There are many other styles available in the Zotero style library. In my field of fisheries biology, a safe bet is to follow guidelines of the American Fisheries Society (AFS). I found a couple of existing CSL files in the Zotero style library that are associated with AFS journals, but they didn’t exactly match the AFS style guide. I wanted to change a couple of things to customize how my bibliography items were formatted so I created a custom CSL file for the 3 item types included in my CV. Broadly, my custom file makes peer-reviewed journal articles meet the AFS style guide. I also made technical reports match the suggested citation format given by the Idaho Department of Fish and Game because all but one of my entries in that category were from my time working for them. That CSL file (felts-custom.csl) is included in the GitHub repository for this blog.
YAML Front Matter
As with any RMarkdown file, there is a specialized bit of code at the beginning that is called a YAML front matter. This will define things like title, authors, etc. in a manuscript. It is also where you can set defaults for font types, colors, and bibliographies. In the template I selected there were a number of options already populated. For me, the things I wanted to include were name, surname (i.e. last name), position, address, email, and date so I just deleted the fields that I didn’t want to keep, and edited the remaining fields to contain my information. There was also some information about the output, which is basically saying to use the “awesomecv” template when rendering, so I left that alone. Also, note that the date is automatically generated to display the month and year at the time you run the code to produce the PDF. I thought this was a nice touch so I left that part intact. One other thing I wanted to adjust from the original template was the color of the text. There was some red styling incorporated, and I just wanted everything to be black, so I defined the color with “headcolor” and a corresponding hexadecimal id (“414141”) that corresponds to black. You can change the headcolor to anything you want, and to see options for hexadecimal colors click here. Finally, to apply the custom CSL file that I created, I included an argument for csl
in the YAML header.
You may notice in the example spreadsheet that I did store a copy of information that may go in the YAML front matter in the sheet named “personal”. I believe it is possible to dynamically update the information in the YAML front matter from data that is read in to the R session, but for my purposes it was simple enough to just enter manually. I have found it useful to have a central tracking system for some of these identifiers such as ORCID so I don’t have to go and look them up if I need them for other uses, so I did have that type of information stored in my spreadsheet as well.
Once I updated with my info and changed the header color to black, my YAML front matter looked like this:
---
name: Eli
surname: Felts
position: "Fisheries Biologist"
address: "Lenore, Idaho"
email: "elifelts@gmail.gov"
date: "'r format(Sys.time(), '%B %Y')`"
csl: felts-custom.csl
headcolor: 414141
output:
vitae::awesomecv:
page_total: true
---
Note that name
, surname
, position
, address
, and email
from that front matter will be used in the header at the beginning of the rendered PDF file.
RMarkdown Body
In an RMarkdown file, R code is inserted in things called chunks, and there are several options that can be defined about what to do with that code. For example, sometimes you just want to display code for demonstration purposes and not actually run it, so you would include an argument of eval=FALSE
to define that you don’t want that code evaluated. A good practice is to set global options for an RMarkdown file at the beginning, and those are already present in this template using the syntax knitr::opts_chunk$set()
. Leave these in the default settings, as this will make it so that your underlying code runs but the code, and associated warnings and messages, don’t show up in the rendered PDF.
::opts_chunk$set(echo = FALSE,
knitrwarning = FALSE,
message = FALSE)
I also like to include a chunk at the beginning of my RMarkdown documents to load any packages I’ll be using later. Here I’ll just need vitae
, tidyverse
, and readxl
. If you’re using a different spreadsheet software to store CV data you may need a different package here, like googlesheets4
if you’re using Google Sheets.
library(vitae)
library(tidyverse)
library(readxl)
Import CV Data from Spreadsheet
Now I was ready to start linking to my spreadsheet, so here I just defined the location of the overall spreadsheet so that I don’t have to type the entire path every time I want to access it.
<- "CV_Example_Workbook.xlsx" spreadsheet.path
Build CV Body
Education
Now I started the first section in the body of my CV, which was education. I read in that sheet from my workbook, and I also wanted to combine and reformat a couple of columns to fit into the parameters of the vitae
functions. For example, I wanted my degrees, which is specified by the what
argument in vitae::detailed_entries
to display as both the degree (e.g. Bachelor of Science) and the major (e.g. Biology). To do this I used stringr::str_c()
to make a new column named display_degree
. I used this convention throughout this process with columns named display_*
to identify columns that I constructed specifically to fit into the vitae
functions and show exactly the information I wanted in the rendered PDF output. I used a similar process to get just the year of degree completion (display_degree)
and the city/state where I went to school (display_where
). For display_where
, I wanted the state to be abbreviated, but I had it stored as the full state name in my spreadsheet. To accomplish this, I made a key that relates the full state names to abbreviations (both are available in base R) and joined those to the data frame when I read them in.
<- tibble(institution_state=state.name,
state_key state=state.abb) # make a key linking full state names to abbreviations
<-read_excel(spreadsheet.path,sheet="education") %>%
education.df left_join(state_key,by="institution_state") %>% # join in state abbreviations to full state names
mutate(display_degree=str_c(degree,major,sep=", "), # combine degree and major in a single column
display_year=year(completion_date), # show just the year degree was completed, from complete date
display_institution=institution, # No changes to how institution was stored, just making a new column to fit my naming convention
display_where=str_c(institution_city,state,sep=", ")) %>% # combine state and city of institution in a single column
arrange(desc(display_year)) # arrange in reverse chronological order
So now I had a table that had all my display_*
columns formatted as I wanted to feed into vitae::detailed_entries()
. I’ll also note here that to make section headings (e.g. “Education”) a single hash mark will identify the header in your RMarkdown document. So, my Education section ending up looking like this:
# Education
detailed_entries(data=education.df,
what=display_degree,
when=display_year,
with=display_institution,
where=display_where)
Work Experience
The next section of my CV was work experience. One difference here from education is that I wanted both job information that is stored in my “work” sheet, and the specific tasks that are stored in the related sheet “work_details”. To get a bullet list below a job using vitae::detailed_entries()
, the why
argument should be provided in a list column. So, my first step here was to read in the work details, and compress into a single row for each job with all of the tasks in an associated list column. Note that dplyr::group_by()
uses both title and company to make sure I was getting unique positions. For my particular work history, every title has been unique, but if you’ve held the same title (e.g. Fisheries Biologist) for multiple agencies you’ll need to use some combination like this to get unique positions.
<- read_excel(spreadsheet.path,sheet="work_details") %>%
work_details.df group_by(title,company) %>% # group by title and company
summarize(display_tasks=list(task)) # make tasks associated with each title/company a single list column
Now I read in the “work” sheet and did some combining and reformatting of some of the columns to fit into the parameters of vitae::detailed_entries()
and display those items in an appealing way. After those steps, I used dplyr::left_join()
to bring in the associated details in the list column that was created in the last code chunk.
<- read_excel(spreadsheet.path,sheet="work") %>%
work.df mutate(display_role=title,
display_company=company,
display_address=str_c(company_city,company_state,sep=", "), # combine company city and state into a single column
display_dates=str_c(format(start_date,"%b. %Y"),
format(end_date,"%b. %Y"),sep=" - ")) %>% # format start and end dates as Abbreviated Month-Year e.g. Mar. 2025, and combine into a single column with start and end separated by a " - "; e.g. Apr. 2024 - Feb. 2025
left_join(work_details.df,by=c("title","company")) # join work details list column to associated position, uniquely identified by title and company
Now I constructed the Work Experience sections with vitae::detailed_entries():
# Work Experience
detailed_entries(data=work.df,
what=display_role,
when=display_dates,
with=display_company,
where=display_address,
why=display_tasks)
Technical Proficiencies
This was a pretty simple field the way I decided to use it. I read in the “technical” sheet and summarized details in a list column so those would render as a bullet list. One additional step I took was to control the order that each “software” was displayed in my CV. I made a table here using tribble
so I could join my desired order to the data I read in from my spreadsheet, then applied dplyr::arrange()
to ensure that they were displayed in that order.
<- tribble(~software, ~cv_order,
techinical.order "Program R", 1,
"Microsoft Office",2,
"Git",3,
"ArcGIS Survey123",4) # make a table to control the order each software will be shown in the CV
<- read_excel(spreadsheet.path,sheet="technical") %>%
technical.df left_join(techinical.order,by="software") %>% # Join software details to order
group_by(software,cv_order) %>% # group by software and order
summarize(display_details=list(details)) %>% # make tasks/skills associated with each software a single list column
arrange(cv_order) # ensure table is arranged according to the assigned order
Now I used vitae::detailed_entries()
. Note that you don’t always have to provide every possible argument to that function, as here I only included what
and why
.
# Tehcnical Proficiencies
detailed_entries(data=technical.df,
what=software,
why=display_details)
Peer-reviewed Journal Articles
Now the first section that used the BibTex file that was exported from Zotero. I wanted to filter the various types (i.e. journal articles, technical reports, and presentations) to put them into separate sections, and I wasn’t sure how the bibliography data would look when it was read into R with vitae::bibliography_entries()
, so I read in the file and just inspected it in the RStudio IDE and found that the column I wanted to filter on was type
and the values in that field were “article-journal”, “report”, and “paper-conference”. Remember I also included an “In Review” article, but I wanted that to go under a separate subheading so I needed to filter it out here, and I found that “Volume” field from Zotero just came through as volume
. I also wanted to arrange citations in reverse chronological order, so I found that when read into the R workspace, the date was contained in the issued
column.
<- bibliography_entries("publications.bib") bib.examine
So now the process was very simple: I read in the BibTex file using the vitae:bibliography_entries()
function and filtered to get only journal articles, then arranged in reverse chronological order.
# Peer-reviewed Journal Articles
bibliography_entries("publications.bib") %>% # read in BibTex file
filter(type=="article-journal",
!volume=="In Review") %>% # Filter to get just peer-reviewed journal articles and drop any with volume indicating "In Review"
arrange(desc(issued)) # Arrange in reverse chronological order
Now to put the “In Review” article under a secondary heading, I used two hash marks, then applied the same process as the published journal articles except I changed the filter and then edited the volume
column to be blank in the bibliography.
## In Review
bibliography_entries("publications.bib") %>% # read in BibTex file
filter(type=="article-journal",
=="In Review") %>%
volumemutate(volume="")# Filter to get just peer-reviewed journal articles, the note filter will get just "In Review"
Technical Reports
I applied the same exact process as journal articles here, just changing the filter on type
.
# Technical Reports
bibliography_entries("publications.bib") %>% # read in BibTex file
filter(type=="report") %>% # Filter to get just technical reports
arrange(desc(issued)) # Arrange in reverse chronological order
Contributed Presentations
I applied the same exact process as journal articles here, just changing the filter on type
.
# Contributed Presentations
bibliography_entries("publications.bib") %>% # read in BibTex file
filter(type=="paper-conference") %>% # Filter to get just Conference Presentations
arrange(desc(issued)) # Arrange in reverse chronological order
Professional Society Memberships
This was another import from my spreadsheet, with a little more code to format dates. I wanted just the start and end year in the display, and if there was no end date, I wanted it to display as “Present”. I also arranged these by end date so they ended up in reverse chronological order. To make this work for memberships that didn’t have an end date, I assigned today’s date as the end.
<- read_excel(spreadsheet.path,sheet="memberships") %>%
service.df mutate(display_org=str_c(organization,str_c(chapter,"Chapter",sep=" "),sep=", "), # Combined the organization and chapter into a single colum
display_end=ifelse(is.na(end_date),"Present",
year(end_date)), # Make "end" show as Present if no end date (still currently a member)
display_timeframe=str_c(year(start_date),display_end,sep=" - ")) %>% # Combine start and end dates, formatted as year into a range column, e.g. 2010-2017
mutate(end=if_else(is.na(end_date),as_date(today()),
%>% # for sorting purposes, assign any rows with no end date to have an end date of today
end_date)) arrange(desc(end),
# arrange descending by end date, then by start date start_date)
With those formatting steps complete I was ready to use vitae::detailed_entries()
again.
# Professional Society Memberships
detailed_entries(data=service.df,
what=display_org,
when=display_timeframe)
Honors Received
One note here is that it just so happened that the length of my CV meant that if I didn’t override any of the automatic formatting, this section started with just the section header at the bottom of one page and the actual entries were on the next. I wanted to override this and move the entire section to the next page, which I did by typing \newpage
in the RMarkdown body. After that, this was a very straightforward read in from my spreadsheet and just formatting the display date to only show the year.
<- read_excel(spreadsheet.path,sheet="awards") %>%
awards.df mutate(display_year=year(date)) # make date show as year
Then I was ready to insert the page break and use vitae::detailed_entries()
.
\newpage
# Honors Received
detailed_entries(data=awards.df,
what=award,
when=display_year)
References
I only wanted 3 references from my spreadsheet, and wanted them in a specific order so I made a table of the names of the references I wanted to include and the order in which I wanted them displayed. Then I read in my references and used an dplyr::inner_join()
with that table so that only matching values would remain. Finally, I adjusted how their contact information was displayed and made sure they were arranged in my desired order.
<- tibble(name=c("John Erhardt","Ryan Hardy","Joe DuPont"),
ref.include order=c(1,2,3)) # define which references I want to include, and the order in which to display
<- read_excel(spreadsheet.path,sheet="references") %>% # read in all references from my spreadsheed
ref.df inner_join(ref.include,by="name") %>% # inner join with the names and order I wanted to include, this will only keep references that match ref.include
mutate(display_title=str_c(title,company, sep=", "),
display_contact=email,
display_name=name) %>%
arrange(order) # arrange by the order I assigned
Now I was ready to use vitae::detailed_entries()
.
# References
detailed_entries(ref.df,
what=display_title,
with=display_name,
where=display_contact)
Rendering
At this point I was ready to run the code and see how it came through in a PDF. Once everything was saved I hit the “Render” button. Here is the output I got:
I was pretty happy with this end product. There’s a couple of things I will tweak depending on specific job descriptions, but it’s a much nicer looking product than my previous Word file, and making incremental changes from here will be very easy, especially if I keep up-to-date in my spreadsheet and Zotero library.
A note on CSL Styling
By far the biggest sticking point in this project for me was getting the bibliography entries to come through in exactly the formats I wanted, which included the AFS style guide for journal articles, and the recommended format from IDFG for technical reports. I relied on Han Zhang’s example for getting an introduction to customization via a CSL file. Once I understood that was how to customize, I knew I needed a CSL file that would make my citations look how I wanted.
This exercise jogged my memory that I had come across CSL files before, and that I had one saved that was for the American Fisheries Society style. I ended up with a couple problems in using that file without editing. First, the peer-reviewed journal entries were not exactly in accordance with the AFS style guide. Specifically, the file I had included both volume and issue in the bibliography, whereas the AFS style guide indicates to only include the volume. Honestly, I probably would’ve just left this for my CV but I knew that in the future when I wrote journal articles to be submitted to AFS journals I wanted to have a CSL file that would generate exactly correct bibliography entries. I also wanted to be able to specify how technical reports were treated in following the recommended format from IDFG.
I had zero familiarity with writing CSL prior to this project. My approach was to try specifying what I wanted to ChatGPT and seeing how it would do. I’ve been slow to adopt AI, but I’ve been using it more lately and have found it’s helpful in some instances and not so great in others. In this exercise it was decidedly “meh”, and probably not better or quicker than the old school method of googling and browsing answers on Stack Overflow. It did not come up with a workable solution on the first few iterations, but what it provided was enough for me to get the basic idea of how CSL code was structured. From that shell, I was mostly able to edit the initial code it gave me so that my bibliography rendered as I had wanted it. I got to the point where all of the components were there and in the order I wanted, but the way author names were listed wasn’t quite right. I remembered that the AFS file I had tried before did that part correctly, so I used the part of that code that controlled how the author names were handled and that got me to where I wanted to be for my CV.
I’m certain that my CSL code would make someone who is more familiar with that language want to puke, but it worked for my purposes here, and in the famous words of Hadley Wickham, “the only way to write good code is to write tons of shitty code first”. So I’ve at least taken that first step in writing some shitty code!
Summary
Overall, this project was relatively painless, largely due to the vitae
package itself, and the associated documentation. The examples provided in links of the vitae
README were extremely helpful for me to see how others had tackled this task and got me a long ways toward my finished product.
The only real difference (I thought) in my approach to this task was how I stored the non-citation data that feeds into my CV. I’ll confess that I didn’t look at every single example linked in the README before running off to build my CV and starting this blog post. As I finished this post I was looking to see if any of those examples used a similar approach and actually most of them did, in some form or fashion. Funny enough, it seems the examples I primarily worked from (Mitchell O’Hara-Wild and Han Zhang) are among the only examples that didn’t link to some sort of other data storage. When I took a glance at some of those that did store data I noticed a couple of things that were a little different than my approach:
- Several used a data folder within th “R” directory that still housed things in R scripts using things like
tribble
to construct the tables; this is how R packages typically structure data files. Overall, I’m a spreadsheet “hater” but I have to say that for this narrow use case I kind of like them better for keeping records than constructing the tables within R. Just a personal taste thing, really. - Most have examples of untidy and/or nonspecific data. In several of the examples, csv files were used for the various data types, similar to what I did in the various tabs within my workbook. Many had at least some columns that were, strictly speaking, not tidy. Usually it was things like having the location listed as “Oshkosh, WI” rather than a column for city and another for state. This really isn’t going to be a big deal but for me personally I prefer the strictly tidy as a starting point.
- For academic-type CVs that include citations using BibTex files, the examples I found used separate BibTex files for different publication types (e.g. journal articles, presentations), whereas I used a single file and filtered within R to split out the different types.
So after going through the process of making my CV data tidy, I did wonder if it was worth the effort? Honestly, from my own starting point I think it was a pretty decent approach; I needed a better way to store these records and what I ended up doing wasn’t more time consuming than other approaches used in some of the examples linked in the vitae
README. That said, if my information was already structured like some of theirs, I probably wouldn’t bother changing. For me, the benefits of tidy data really come through when summarizing and iterating over groups, and when building plots using ggplot2
. I won’t be doing any of those things with these data. A lot of the untidy examples I found are also very minor, and could be converted to the strictly tidy form with a couple of lines of code if needed. All told, I do think this was a good exercise. I’ll be in much better shape to update my CV the next time the need arises, or to customize a little bit for specific job postings.