How to Upload Data in R From .txt
Importing Information into R
A tutorial well-nigh data analysis using R
Dr Jon Yearsley (School of Biology and Environmental Science, UCD)
- Objectives
- Organise yourself!
- Data Workflow
- Format your data (tidy data)
- Data frames
- Importing spreadsheet data
- Summary of the topics covered
- Further Reading
How to Read this Tutorial
This tutorial is a mixture of R code chunks and explanations of the code. The R code chunks will appear in boxes.
Below is an example of a chunk of R code:
# This is a chunk of R code. All text after a # symbol is a annotate # Ready working directory using setwd() function setwd('Enter the path to my working directory') # Articulate all variables in R'south retention rm(list= ls()) # Standard lawmaking to clear R's memory
Sometimes the output from running this R code will exist displayed subsequently the chunk of code. R output volition be preceeded by ##.
Here is a chunk of code followed past the R output
2 + four # Use R to add 2 numbers
## [1] half-dozen
Objectives
The objectives of this tutorial are:
- Demonstrate proficient practice in data organisation
- Introduce evidently text file formats for data
- Explain data import into R
Organise yourself!
Earlier you start importing data into R you should take time to organised your workspace on your computer:
- Create a folder on your calculator to contain all your work for this particular project (e.thou. a folder chosen DataModule)
- Within this project binder create another folder chosen
data
. This will concord all the raw data files. These raw data files should not be inverse. - Inside this projection folder create a text file called
MyFirstScript.R
. You can use RStudio for this (for this use File->New File->R Script menu option) or whatsoever basic text editor to do this (e.g. Notepad, TextEdit, gedit, emacs). This file will be your R script that will contain all the commands for R. The.r
or.R
suffixes is the standard suffix for an R script. - If you are starting a big project consider creating split binder for: R scripts, figures, output from the R script
Your commencement R script
Now you have created the file MyFirstScript.R
you should put some header text at the kickoff of the file to explicate what the R script volition exercise. This was described in tutorial 1.
Video Tutorial: Creating a new R script with RStudio (i min)
The text should have a short explanation of the R script followed by your proper name and the date you wrote the R script. Each line should starting time with a #
and then that the text is non interpreted by R (this text is for humans and so they understand what the file is intended to do). Here is an example,
# ********** Start of header ************** # Title: <The title of your R script> # # Add a short description of the R script here. # # Author: <your name> (email address) # Date: <today's date> # # *********** End of header **************** # Two common commands at the get-go of an R script are: rm(list=ls()) # Clear R's retentiveness setwd('~/DataModule') # Set the working directory # Replace '~/DataModule' with the name of your own directory # ****************************************** # Write your commands below. # Call back to use comments to explicate your commands
Writing clear R scripts
An R script isn't merely telling the computer how to perform calculations on your information. It is too explaining your working to other human beings.
"Instead of imagining that our primary task is to instruct a computer what to do, let us concentrate rather on explaining to human being beings what we want a computer to do." – Donald Eastward. Knuth
To make your R scripts usable by humans they must be clearly commented (using the #
symbol to beginning a comment) and clearly organised.
As you write an R script consider these questions:
- Does your R script look well organised (e.g. is it well spaced, are lines indented logically)?
- Could someone else read the R script and understand the basic thought?
- Could someone else modify your R script relatively easily?
- In a couple of months fourth dimension could you quickly read and edit your own R script?
Professional person data analysts accept clarity very seriously. Hither are some links to R coding style guides:
- Google's style guide, https://google.github.io/styleguide/Rguide.xml
- Hadley Wickham's style guide, http://adv-r.had.co.nz/Way.html
- http://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
- http://nicercode.github.io/blog/2013-04-05-why-squeamish-code/
Data Workflow
Below is a schematic of the workflow for handling data.
In this tutorial we will consider formating information, in the next tutorial nosotros'll discuss importing data, and so we'll start to consider exploring the data using graphics and numerical summaries.
Format your information (tidy data)
The workflow starts long earlier you analyse your data. Information technology starts even before you have your information in some estimator software.
Organising your data should follow tidy data guidelines (run into below) and exist planned before you collect your data. The format of the data should be finalised before importing the data into R. Information technology is oftentimes easiest to tidy your data using a spreadsheet program before you import the data into R.
Well organised data from the start volition make your life a lot easier and your data import as painless as possible.
Six guidelines for tidy data
When tidying your data you should ensure that:
- each variable has its ain column
- each row is an observation
- the pinnacle of each column contains the proper noun of the variable
- there are no bare columns or bare rows between data
- all data in a column has the same type (e.g. information technology is all numerical data, or it is all text data)
- data are consequent (due east.1000. if a binary variable can accept values 'Yes' or 'No' so merely these 2 values are allowed, with no alternatives such as 'Y' and 'N')
PDF Summary: This PDF document reiterates the concept of tidy data
The link to the PDF is: http://world wide web.ucd.ie/ecomodel/pdf/TidyData.pdf
Poorly vs well formatted data
The data set shown in the effigy below are an example of poorly formatted information. The information fix contains data on the atomic number 82 concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were collected: samples from fillets of fish and from whole fish. The data has three variables: lead concentration, species of fish and type of fish sample.
How would you improve the format of the poorly formatted data shown in the figure? (Hint: use the six guidelines above)
The second figure shows some well formatted information that follows the tidy data guidelines: each cavalcade represents a unmarried variable and each row an observation.
Information frames
A data frame is R'southward name for spreadsheet data (eastward.m. data organised in a grid, similar Excel). R stores the vast bulk of data as a information frame and uses data frames when analyzing data.
A data frame forces the data to exist well organised.
- Each column is a variable. The proper name of this variable becomes the proper name of the column.
- Each row corresponds to an observation. This meas that values in the same row are data collected about the same object. Rows can as well have names.
Below is an instance of a data frame (called airquality
) that contains data on the air quality in New York from May - September 1973 (this is a information set that is built in to R).
# The airquality data is a congenital-in dataset # First x rows of the airquality data frame head(airquality, due north= x)
## Ozone Solar.R Air current Temp Month Day ## 1 41 190 7.four 67 5 1 ## two 36 118 8.0 72 5 ii ## 3 12 149 12.6 74 five 3 ## four eighteen 313 11.5 62 5 iv ## 5 NA NA 14.3 56 five 5 ## 6 28 NA 14.ix 66 v six ## 7 23 299 eight.6 65 v 7 ## 8 nineteen 99 xiii.8 59 5 8 ## 9 8 19 20.ane 61 5 9 ## ten NA 194 8.6 69 five ten
You can type ?airquality
to brandish the aid file for this data prepare. The data frame has 154 rows (observations) and 6 columns (variables measured). The 6 columns contain data on: ozone concentrations (parts per billion), solar radiation, wind speed, air temperature, month and day of observation. You lot can meet that each cavalcade has a name corresponding to the information for that column.
The construction of the data frame can exist viewed using the str()
function
# Display the structure of the airquality data frame str(airquality)
## 'data.frame': 153 obs. of 6 variables: ## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... ## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... ## $ Wind : num 7.4 8 12.6 11.5 14.3 fourteen.9 viii.6 thirteen.8 20.1 eight.6 ... ## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... ## $ Month : int v 5 5 five 5 5 five 5 5 5 ... ## $ Mean solar day : int one two iii 4 5 six vii 8 9 10 ...
The str()
office shows that this is a information frame with 153 observations (rows) and six variables (columns). Information technology also shows the data tyes of the variables: air current is a numerical variable (i.e. continuous) and the other variables are all integers (i.east. whole numbers).
Tidy data in R is described in more particular on this web folio: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Tibbles
A recent development (circa 2016) is an improved data frame called a tibble. We will not discuss these new data frame objects here, but you can read most them at https://cran.r-project.org/spider web/packages/tibble/vignettes/tibble.html.
Don't Panic! Tibbles are very like to data frames.
The of import indicate to know is that if yous utilise RStudio'southward GUI interface to import data then your information volition exist stored in a tibble, not a data frame.
Importing spreadsheet data
To showtime working with information in R you need to import your data into R. You are aiming to have a information frame that contains your information.
The simplest way to import data into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes chosen flat files) can be read by any estimator operating organization and past many different statistical programs. Saving data as a uncomplicated text file makes your information highly transportable.
Importing data from software specific formats (eastward.g. Excel's .XLSX format, Minitab's .MTW format, SPSS's .SAV format or SAS's .SAS format) is possible (e.g. using RStudio'southward Import Dataset GUI). If you want your data to be easily shared with other people then use a text file to store your data.
We advise you to:
- save your information as a text file (software, such as Excel, frequently accept an pick to save information as plain text)
- organize information with columns corresponding to different variables before exporting to the text file
- utilize a visible text character to delimit each column (commonly a comma, semi-colon). Using an invisible character (due east.g. a infinite or a TAB) is not recommended because these characters all look the aforementioned at outset glance.
General advice on importing data into R can be found at https://cran.r-project.org/md/manuals/r-release/R-data.html
Converting data to a CSV text file
A comma separated values file (CSV file) is the almost common format for a text file that contains information.
Here are a few video tutorials on converting data into a CSV text file then that it is suitable for import into R.
Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)
Video Tutorial: Converting data from Googlesheets to a CSV format (1 min)
Viewing text files
Earlier importing a text file into any software package information technology is a huge help if yous can look at information technology in a text editor. Text files can contain characters that are unremarkably invisible (east.thousand. spaces, tabs and end of line markers). If a text editor is going to exist of use it must exist able to display all the characters in a file.
Three text editors that can do this are:
notepad++ is a free plan for Windows operating systems
BBedit is a free program for Mac OSX operating systems
emacs is a GNU opensource program primarily for Linux operating systems.
On Linux systems the true cat -A
command from the terminal is also useful.
Hither are two video tutorials on this topic
Video Tutorial: Viewing data in a text file before importing into R (four mins)
Video Tutorial: An overview of the mutual information text file formats (iii mins)
Data import examples
The data we'll be importing are described at http://www.ucd.ie/ecomodel/Resources/datasets_WebVersion.html
The files are:
- WOLF.CSV: This file is a text file of comma separated values.
- Summit.CSV: This file is a text file of comma separated values.
- INSECT.TXT:This file is a text file of TAB delimited values.
- BEEKEEPER.TXT: This file is a text file with blank space delimiting the values.
- MALIN_HEAD.TXT: This file is a text file with TAB delimited values.
All these data files are simple text files that differ in the character used to distinguish columns of data.
Comma delimited files (CSV files)
CSV stands for comma separated values (note sometimes semi-colons are used in place of commas because some countries use the comma in place of the decimal indicate).
The read.table()
function is a flexible function for importing text data
Video Tutorial: Importing a CSV file into R using read.table() (5 mins)
# Import WOLF.CSV file using read.tabular array function wolf = read.table('WOLF.CSV', header= TRUE, sep= ',')
The wolf
variable contains the imported information. It is called a data frame.
The ideal arrangement of a data frame is for each row to exist an observation of some object and each columns a variable that measures some property of the object. For case, each row of wolf
is an observation of one individual wolf and each column of wolf
give information about where the wolf was observed and the data collected from its hair sample.
The Summit.CSV file also contains comma separated values. Here is the read.tabular array()
control to read in this file
# Import HEIGHT.CSV file using read.tabular array role human = read.table('Height.CSV', header= Truthful, sep= ',')
Note: The part read.csv()
is a special case of the read.table()
function.
Use the R aid pages to learn more than about these functions
?read.table # Brandish help page on read.table function
TAB delimited files (TXT files)
The INSECT.TXT data set is a text file where variables are delimited by a TAB. In addition the beginning three lines contain a data description that we do non want to import.
The read.table()
function can be used to import this file. The statement skip=iii
is used to ignore the first three lines. The argument sep='\t'
specifies a TAB as the variable delimiter
# Import INSECT.TXT file using read.table function (TAB delimited) # skipping the commencement 3 lines (skip=iii) insect = read.table('INSECT.TXT', header=T, skip= 3, sep= ' \t ')
The MALIN_HEAD.TXT also contains TAB delimited information. Here is the read.table()
command to read in this file
# Import MALIN_HEAD.TXT file using read.tabular array office (TAB delimited) rainfall = read.table('MALIN_HEAD.TXT', header=T, sep= ' \t ')
Blank infinite delimited files
The Beekeeper.TXT information gear up uses white space to delimit the variables. The first half dozen lines of the file contain a description of the data
Using read.table()
with the argument sep=''
volition translate whatsoever space every bit a variable delimiter.
# Import Beekeeper.TXT file using read.table part (white space delimited) # skipping the first 6 lines (skip=6) bees = read.table('Beekeeper.TXT', header=T, skip= half dozen, sep= '')
Summary important commands
Type of text file | R Command |
---|---|
Comma delimited (.CSV) | read.tabular array(<filename>, header=T, sep=',') |
TAB delimited (.TXT) | read.table(<filename>, header=T, sep='\t') |
Bare space (.TXT) | read.table(<filename>, header=T, sep='') |
# Comma separated values wolf = read.table('WOLF.CSV', header= TRUE, sep= ',') human = read.table('HEIGHT.CSV', header= True, sep= ',') # TAB delimited values insect = read.table('INSECT.TXT', header=T, skip= three, sep= ' \t ') rainfall = read.table('MALIN_HEAD.TXT', header=T, sep= ' \t ') # White space delimited values bees = read.table('BEEKEEPER.TXT', header=T, skip= 6, sep= '')
Importing information using RStudio
RStudio has its own data import functionality. To utilise this you lot will demand to install the R package readr
. For more inofmration most this run across RStudio's guide: https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio
Video Tutorial: Importing a CSV file into R using RStudio'southward GUI (3 mins 13 secs)
Importing data using RStudio will save the data as a modified data frame, chosen a tibble
(tibbles are briefly discussed above).
Importing using fread()
fread()
is a powerful information import office that is similar to read.table()
just faster. It is part of the information.table
bundle, which you will demand to install.
You lot should only have to requite fread()
the proper noun of the file you want to import, and fread()
will try to work out the appropriate way to import the data. Try some examples and compare the the examples above
# ****************************************** # Other packages for importing data -------- # The data.table package library(information.table) # Load the information.table package # Import a CSV file wolf2 = fread('WOLF.CSV') human2 = fread('Summit.CSV') # Import TAB delimited file insect2 = fread('INSECT.TXT') rainfall2 = fread('MALIN_HEAD.TXT') # Import white infinite delimited file bees2 = fread('BEEKEEPER.TXT')
The fread()
command is simpler to employ because it tries to guess the format of the information in the file.
Summary of the topics covered
- Organizing your files on your calculator
- Best practise for formatting data
- Reading in spreadsheet information
- Data frames
Further Reading
All these books tin can exist found in UCD's library
- Andrew P. Beckerman and Owen L. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford Academy Printing, Oxford) [Chapter 2, 3]
- Marker Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter)
- Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Chapter 2]
- Tenko Raykov and George A Marcoulides, 2013 Basic statistics: an introduction with R (Rowman and Littlefield, Plymouth)
Source: https://www.ucd.ie/ecomodel/Resources/Sheet2a_data_import_WebVersion.html
0 Response to "How to Upload Data in R From .txt"
Publicar un comentario