Friday, February 27, 2015

R: Removing columns from a Dataset

If you need to remove certain columns from your dataset, you can do it the following way:

> iris[1:5,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
> iris[,-5][1:5,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2
3          4.7         3.2          1.3         0.2
4          4.6         3.1          1.5         0.2
5          5.0         3.6          1.4         0.2
> 

Monday, February 16, 2015

R : Using grepl

Sometimes the need is for selecting and displaying all columns of a dataset except for 1 or 2 columns.
Identifying the columns by index no may not always be possible. If the data is correctly labelled and column names are aptly named, then using index number hinders the code readability. To overcome such scenario use the following code:

 > studentData <- data.frame(ID=paste0("Student",1:50),
+                           Math=sample(100,50),
+                           Science=sample(100,50),
+                           History=sample(100,50),
+                           Final=sample(100,50))
> studentData
          ID Math Science History Final
1   Student1   15      78      41    93
2   Student2   85      46      52    75
3   Student3   10      12      17    99.....

> studentData[,!grepl("ID",colnames(studentData))]
   Math Science History Final
1    15      78      41    93
2    85      46      52    75
3    10      12      17    99...
Happy programming!!

Wednesday, February 11, 2015

R: Initializing an empty list

If you are new to R, there is a thick chance that you may want to grow your dataset (dataframe) dynamically. As we all in know that dynamic memory allocation is a tricky business and in R it is more so.

In R, dynamically allocated memory in loops and function affects the performance of the system(program). A vector object’s growth in each iteration of a loop takes its own time for the loop to complete, which decreases the program speed.
The best solution to get rid of such speed issues is to predefined size of vector and fill it up with the values inside for loop, whenever possible.

For example:
>emptyData <- rep(NA, 100000)
>emptyData <- rep(1:100)
>emptyData <- rep(1:10, times=3)
Happy programming!

Tuesday, February 10, 2015

R : Reading data from CSV file

To read data from a csv file, you can use the use the read.table function in R:

> myData <- read.table("C:/Downloads/R_Tutorial/18MarchStudentInfo.csv", header=TRUE, sep=",")
The above command assumes your csv file has a header field at the begining of the file describing the column names. If your CSV does not contain a row describing the column names you can use it as :

> mydata <- read.table("C:/Downloads/R_Tutorial/18MarchStudentInfo.csv", header=FALSE, sep=",")
Change sep field appropriately depending on your column delimiter in your CSV

Interview Question Preperation : Find longest subarray whose sum is equal to K

Software Engineering Practice interview question Given a array of N elements. Find the length of Longest Subarray whose sum is equal to give...