How to Upload Data in R From .txt

Importing Information into R

A tutorial well-nigh data analysis using R

Dr Jon Yearsley (School of Biology and Environmental Science, UCD)

  • Objectives
  • Organise yourself!
  • Data Workflow
  • Format your data (tidy data)
  • Data frames
  • Importing spreadsheet data
  • Summary of the topics covered
  • Further Reading

How to Read this Tutorial

This tutorial is a mixture of R code chunks and explanations of the code. The R code chunks will appear in boxes.

Below is an example of a chunk of R code:

                                          # This is a chunk of R code. All text after a # symbol is a annotate                                            # Ready working directory using setwd() function                                            setwd('Enter the path to my working directory')                                            # Articulate all variables in R'south retention                                            rm(list=                ls())                # Standard lawmaking to clear R's memory                                    

Sometimes the output from running this R code will exist displayed subsequently the chunk of code. R output volition be preceeded by ##.

Here is a chunk of code followed past the R output

                                          2                +                four                # Use R to add 2 numbers                                    
          ## [1] half-dozen        

Objectives

The objectives of this tutorial are:

  1. Demonstrate proficient practice in data organisation
  2. Introduce evidently text file formats for data
  3. Explain data import into R

Organise yourself!

Earlier you start importing data into R you should take time to organised your workspace on your computer:

  • Create a folder on your calculator to contain all your work for this particular project (e.thou. a folder chosen DataModule)
  • Within this project binder create another folder chosen data. This will concord all the raw data files. These raw data files should not be inverse.
  • Inside this projection folder create a text file called MyFirstScript.R. You can use RStudio for this (for this use File->New File->R Script menu option) or whatsoever basic text editor to do this (e.g. Notepad, TextEdit, gedit, emacs). This file will be your R script that will contain all the commands for R. The .r or .R suffixes is the standard suffix for an R script.
  • If you are starting a big project consider creating split binder for: R scripts, figures, output from the R script

Your commencement R script

Now you have created the file MyFirstScript.R you should put some header text at the kickoff of the file to explicate what the R script volition exercise. This was described in tutorial 1.

Video Tutorial: Creating a new R script with RStudio (i min)

The text should have a short explanation of the R script followed by your proper name and the date you wrote the R script. Each line should starting time with a # and then that the text is non interpreted by R (this text is for humans and so they understand what the file is intended to do). Here is an example,

          # ********** Start of header ************** # Title: <The title of your R script>  # # Add a short description of the R script here. # # Author: <your name>  (email address) # Date: <today's date> # # *********** End of header ****************  # Two common commands at the get-go of an R script are: rm(list=ls())         # Clear R's retentiveness  setwd('~/DataModule') # Set the working directory  # Replace '~/DataModule' with the name of your own directory  # ****************************************** # Write your commands below.  # Call back to use comments to explicate your commands                  

Writing clear R scripts

An R script isn't merely telling the computer how to perform calculations on your information. It is too explaining your working to other human beings.

"Instead of imagining that our primary task is to instruct a computer what to do, let us concentrate rather on explaining to human being beings what we want a computer to do." – Donald Eastward. Knuth

To make your R scripts usable by humans they must be clearly commented (using the # symbol to beginning a comment) and clearly organised.

As you write an R script consider these questions:

  • Does your R script look well organised (e.g. is it well spaced, are lines indented logically)?
  • Could someone else read the R script and understand the basic thought?
  • Could someone else modify your R script relatively easily?
  • In a couple of months fourth dimension could you quickly read and edit your own R script?

Professional person data analysts accept clarity very seriously. Hither are some links to R coding style guides:

  1. Google's style guide, https://google.github.io/styleguide/Rguide.xml
  2. Hadley Wickham's style guide, http://adv-r.had.co.nz/Way.html
  3. http://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
  4. http://nicercode.github.io/blog/2013-04-05-why-squeamish-code/

Data Workflow

Below is a schematic of the workflow for handling data.

Figure: The workflow to follow when handling data.

In this tutorial we will consider formating information, in the next tutorial nosotros'll discuss importing data, and so we'll start to consider exploring the data using graphics and numerical summaries.

Format your information (tidy data)

The workflow starts long earlier you analyse your data. Information technology starts even before you have your information in some estimator software.

Organising your data should follow tidy data guidelines (run into below) and exist planned before you collect your data. The format of the data should be finalised before importing the data into R. Information technology is oftentimes easiest to tidy your data using a spreadsheet program before you import the data into R.

Well organised data from the start volition make your life a lot easier and your data import as painless as possible.

Six guidelines for tidy data

When tidying your data you should ensure that:

  1. each variable has its ain column
  2. each row is an observation
  3. the pinnacle of each column contains the proper noun of the variable
  4. there are no bare columns or bare rows between data
  5. all data in a column has the same type (e.g. information technology is all numerical data, or it is all text data)
  6. data are consequent (due east.1000. if a binary variable can accept values 'Yes' or 'No' so merely these 2 values are allowed, with no alternatives such as 'Y' and 'N')

PDF Summary: This PDF document reiterates the concept of tidy data

The link to the PDF is: http://world wide web.ucd.ie/ecomodel/pdf/TidyData.pdf

Poorly vs well formatted data

The data set shown in the effigy below are an example of poorly formatted information. The information fix contains data on the atomic number 82 concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were collected: samples from fillets of fish and from whole fish. The data has three variables: lead concentration, species of fish and type of fish sample.

Figure: A poorly formatted data set. This file would be hard to import and analyse in this format.

How would you improve the format of the poorly formatted data shown in the figure? (Hint: use the six guidelines above)

The second figure shows some well formatted information that follows the tidy data guidelines: each cavalcade represents a unmarried variable and each row an observation.

Figure: A well formatted data set. This file would be easy to import and analyse in this format. One column contains the data for one variable. These data are the worldwide occurences of Covid-19, downlaoded from the European Centre for Disease Prevention and Control, https://www.ecdc.europa.eu/en

Information frames

A data frame is R'southward name for spreadsheet data (eastward.m. data organised in a grid, similar Excel). R stores the vast bulk of data as a information frame and uses data frames when analyzing data.

A data frame forces the data to exist well organised.

  • Each column is a variable. The proper name of this variable becomes the proper name of the column.
  • Each row corresponds to an observation. This meas that values in the same row are data collected about the same object. Rows can as well have names.

Below is an instance of a data frame (called airquality) that contains data on the air quality in New York from May - September 1973 (this is a information set that is built in to R).

                                          # The airquality data is a congenital-in dataset                                                          # First x rows of the airquality data frame                                            head(airquality,                due north=                x)                      
          ##    Ozone Solar.R Air current Temp Month Day ## 1     41     190  7.four   67     5   1 ## two     36     118  8.0   72     5   ii ## 3     12     149 12.6   74     five   3 ## four     eighteen     313 11.5   62     5   iv ## 5     NA      NA 14.3   56     five   5 ## 6     28      NA 14.ix   66     v   six ## 7     23     299  eight.6   65     v   7 ## 8     nineteen      99 xiii.8   59     5   8 ## 9      8      19 20.ane   61     5   9 ## ten    NA     194  8.6   69     five  ten        

You can type ?airquality to brandish the aid file for this data prepare. The data frame has 154 rows (observations) and 6 columns (variables measured). The 6 columns contain data on: ozone concentrations (parts per billion), solar radiation, wind speed, air temperature, month and day of observation. You lot can meet that each cavalcade has a name corresponding to the information for that column.

The construction of the data frame can exist viewed using the str() function

                                          # Display the structure of the airquality data frame                                            str(airquality)                      
          ## 'data.frame':    153 obs. of  6 variables: ##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ... ##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ... ##  $ Wind   : num  7.4 8 12.6 11.5 14.3 fourteen.9 viii.6 thirteen.8 20.1 eight.6 ... ##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ... ##  $ Month  : int  v 5 5 five 5 5 five 5 5 5 ... ##  $ Mean solar day    : int  one two iii 4 5 six vii 8 9 10 ...        

The str() office shows that this is a information frame with 153 observations (rows) and six variables (columns). Information technology also shows the data tyes of the variables: air current is a numerical variable (i.e. continuous) and the other variables are all integers (i.east. whole numbers).

Tidy data in R is described in more particular on this web folio: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html

Tibbles

A recent development (circa 2016) is an improved data frame called a tibble. We will not discuss these new data frame objects here, but you can read most them at https://cran.r-project.org/spider web/packages/tibble/vignettes/tibble.html.

Don't Panic! Tibbles are very like to data frames.

The of import indicate to know is that if yous utilise RStudio'southward GUI interface to import data then your information volition exist stored in a tibble, not a data frame.

Importing spreadsheet data

To showtime working with information in R you need to import your data into R. You are aiming to have a information frame that contains your information.

The simplest way to import data into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes chosen flat files) can be read by any estimator operating organization and past many different statistical programs. Saving data as a uncomplicated text file makes your information highly transportable.

Importing data from software specific formats (eastward.g. Excel's .XLSX format, Minitab's .MTW format, SPSS's .SAV format or SAS's .SAS format) is possible (e.g. using RStudio'southward Import Dataset GUI). If you want your data to be easily shared with other people then use a text file to store your data.

We advise you to:

  • save your information as a text file (software, such as Excel, frequently accept an pick to save information as plain text)
  • organize information with columns corresponding to different variables before exporting to the text file
  • utilize a visible text character to delimit each column (commonly a comma, semi-colon). Using an invisible character (due east.g. a infinite or a TAB) is not recommended because these characters all look the aforementioned at outset glance.

General advice on importing data into R can be found at https://cran.r-project.org/md/manuals/r-release/R-data.html

Converting data to a CSV text file

A comma separated values file (CSV file) is the almost common format for a text file that contains information.

Here are a few video tutorials on converting data into a CSV text file then that it is suitable for import into R.

Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)

Video Tutorial: Converting data from Googlesheets to a CSV format (1 min)

Viewing text files

Earlier importing a text file into any software package information technology is a huge help if yous can look at information technology in a text editor. Text files can contain characters that are unremarkably invisible (east.thousand. spaces, tabs and end of line markers). If a text editor is going to exist of use it must exist able to display all the characters in a file.

Three text editors that can do this are:

notepad++ is a free plan for Windows operating systems

BBedit is a free program for Mac OSX operating systems

emacs is a GNU opensource program primarily for Linux operating systems.

On Linux systems the true cat -A command from the terminal is also useful.

Hither are two video tutorials on this topic

Video Tutorial: Viewing data in a text file before importing into R (four mins)

Video Tutorial: An overview of the mutual information text file formats (iii mins)

Data import examples

The data we'll be importing are described at http://www.ucd.ie/ecomodel/Resources/datasets_WebVersion.html

The files are:

  • WOLF.CSV: This file is a text file of comma separated values.
  • Summit.CSV: This file is a text file of comma separated values.
  • INSECT.TXT:This file is a text file of TAB delimited values.
  • BEEKEEPER.TXT: This file is a text file with blank space delimiting the values.
  • MALIN_HEAD.TXT: This file is a text file with TAB delimited values.

All these data files are simple text files that differ in the character used to distinguish columns of data.

Comma delimited files (CSV files)

CSV stands for comma separated values (note sometimes semi-colons are used in place of commas because some countries use the comma in place of the decimal indicate).

The read.table() function is a flexible function for importing text data

Video Tutorial: Importing a CSV file into R using read.table() (5 mins)

                                          # Import WOLF.CSV file using read.tabular array function                            wolf                =                read.table('WOLF.CSV',                header=                TRUE,                sep=                ',')                      

The wolf variable contains the imported information. It is called a data frame.

The ideal arrangement of a data frame is for each row to exist an observation of some object and each columns a variable that measures some property of the object. For case, each row of wolf is an observation of one individual wolf and each column of wolf give information about where the wolf was observed and the data collected from its hair sample.

The Summit.CSV file also contains comma separated values. Here is the read.tabular array() control to read in this file

                                          # Import HEIGHT.CSV file using read.tabular array role                            human                =                read.table('Height.CSV',                header=                Truthful,                sep=                ',')                      

Note: The part read.csv() is a special case of the read.table() function.

Use the R aid pages to learn more than about these functions

                          ?read.table                # Brandish help page on read.table function                                    

TAB delimited files (TXT files)

The INSECT.TXT data set is a text file where variables are delimited by a TAB. In addition the beginning three lines contain a data description that we do non want to import.

The read.table() function can be used to import this file. The statement skip=iii is used to ignore the first three lines. The argument sep='\t' specifies a TAB as the variable delimiter

                                          # Import INSECT.TXT file using read.table function (TAB delimited)                                            # skipping the commencement 3 lines (skip=iii)                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                3,                sep=                '                \t                ')                      

The MALIN_HEAD.TXT also contains TAB delimited information. Here is the read.table() command to read in this file

                                          # Import MALIN_HEAD.TXT file using read.tabular array office (TAB delimited)                            rainfall                =                read.table('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')                      

Blank infinite delimited files

The Beekeeper.TXT information gear up uses white space to delimit the variables. The first half dozen lines of the file contain a description of the data

Using read.table() with the argument sep='' volition translate whatsoever space every bit a variable delimiter.

                                          # Import Beekeeper.TXT file using read.table part (white space delimited)                                            # skipping the first 6 lines (skip=6)                            bees                =                read.table('Beekeeper.TXT',                header=T,                skip=                half dozen,                sep=                '')                      

Summary important commands

Type of text file R Command
Comma delimited (.CSV) read.tabular array(<filename>, header=T, sep=',')
TAB delimited (.TXT) read.table(<filename>, header=T, sep='\t')
Bare space (.TXT) read.table(<filename>, header=T, sep='')
                                          # Comma separated values                            wolf                =                read.table('WOLF.CSV',                header=                TRUE,                sep=                ',')              human                =                read.table('HEIGHT.CSV',                header=                True,                sep=                ',')                                            # TAB delimited values                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                three,                sep=                '                \t                ')              rainfall                =                read.table('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')                                            # White space delimited values                            bees                =                read.table('BEEKEEPER.TXT',                header=T,                skip=                6,                sep=                '')                      

Importing information using RStudio

RStudio has its own data import functionality. To utilise this you lot will demand to install the R package readr. For more inofmration most this run across RStudio's guide: https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio

Video Tutorial: Importing a CSV file into R using RStudio'southward GUI (3 mins 13 secs)

Importing data using RStudio will save the data as a modified data frame, chosen a tibble (tibbles are briefly discussed above).

Importing using fread()

fread() is a powerful information import office that is similar to read.table() just faster. It is part of the information.table bundle, which you will demand to install.

You lot should only have to requite fread() the proper noun of the file you want to import, and fread() will try to work out the appropriate way to import the data. Try some examples and compare the the examples above

                                          # ******************************************                                            # Other packages for importing data --------                                            # The data.table package                                                          library(information.table)                # Load the information.table package                                                          # Import a CSV file                            wolf2                =                fread('WOLF.CSV')                            human2                =                fread('Summit.CSV')                                            # Import TAB delimited file                            insect2                =                fread('INSECT.TXT')              rainfall2                =                fread('MALIN_HEAD.TXT')                                                          # Import white infinite delimited file                            bees2                =                fread('BEEKEEPER.TXT')                      

The fread() command is simpler to employ because it tries to guess the format of the information in the file.

Summary of the topics covered

  • Organizing your files on your calculator
  • Best practise for formatting data
  • Reading in spreadsheet information
  • Data frames

Further Reading

All these books tin can exist found in UCD's library

  • Andrew P. Beckerman and Owen L. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford Academy Printing, Oxford) [Chapter 2, 3]
  • Marker Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter)
  • Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Chapter 2]
  • Tenko Raykov and George A Marcoulides, 2013 Basic statistics: an introduction with R (Rowman and Littlefield, Plymouth)

trippreell1971.blogspot.com

Source: https://www.ucd.ie/ecomodel/Resources/Sheet2a_data_import_WebVersion.html

0 Response to "How to Upload Data in R From .txt"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel