Skip to content

Data analysis with Unix

GitHub Classroom Assignment

How are Unix commands used for data analysis?

The quiz will cover the third lecture and the reading from Biostars chapter 18: Data analysis with Unix (pg 175)

The questions will ask you about the content of the file at

http://data.biostarhandbook.com/data/SGD_features.tab

Download this file onto your computer before venturing forth.

::: tip Additional information on the SGD_features.tab file can be found in http://data.biostarhandbook.com/data/SGD_features.README :::

Instructions

  1. For each question create a script file. For example question_1.bash, question_2.bash, …
  2. Place your script analyze the data and produce an output with the correct information.
  3. Push your code up to GitHub to receive feedback on your answers!

Questions

  1. How many lines does this file contain?
  2. How many lines match the pattern gene?
  3. How many lines match the pattern ORF?
  4. How many lines match the pattern ORF in the second column?
  5. Which word of the second column appears 50 times?
  6. The word Z3_region appears how many times in the second column?
  7. How many features are located on the forward strand?
  8. How many features have no strand information listed?
  9. The standard gene name column lists each gene name only once:(True or False)
  10. More rows have feature types than feature names:(True or False)