How to Clean Data at the Command Line

Learn how to clean data using command-line utility jq, xsv, and csvkit.
Share the link to this page
Copied

About the Product

Cleaning data is a widely known process that can let us explore data and see beyond its raw form. Multiple technologies can solve this task, but we have a problem.

The data-driven problem we face: Whenever you want to import a CSV file, by habit, you go to Google and see how to find the two lines that you always forget (in Python for example) so you get them to open up your text editor to make a file and paste what you found in it.

Why the command line?

The simplest data cleaning tasks might sound frustrating or time-wasting and maybe you use a higher-level library like Pandas but I bet you still write more code than just dealing with the terminal which can pack a bunch of lines of codes into just one-liner at the command line.

This ebook makes dealing with CSV files, JSON, or in general any text file much easier.

What's in it for you?

In this ebook, I'm trying to save your time and the hassle of dealing with files at the system level. You may also like the adventure of exploring command-line tools and programs that you may not have heard of. I encourage you to try these tools as I do on my workdays.

While dealing with the command line may sound a bit geeky, this ebook is simple and easy to follow, and it's a lot of fun. There are real examples from a scientific paper, Covid tracking project data, Reddit user data, and more that you can practice with and try useful programs and tools at the comfort of your command line.

Content:

  • In this ebook you'll be able to clean data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook and practice cleaning a COVID-19 CSV file using command-line programs: csvkit and xsv comparing the performance of each.
  • You'll also see how to sort and concatenate a large CSV file with csvkit and xsv, and calculate their performance with respect to Pandas.
  • In the last chapter, you'll get to know how to clean a JSON file using the command-line program jq.

Who should take this Product?

If you are a data scientist, data engineer, data analyst, software developer, or you use data a lot (like TXT, CSV, or JSON), this ebook is for you.

Requirements

You should have a basic understanding of how the terminal works.

Author

Ezzeddin Abdullah

Data Platform Engineer

School

Ezzeddinabdullah's School

One-time Fee
$8
€7.68
£6.31
CA$11.51
A$12.81
S$10.87
HK$62.16
CHF 7.17
NOK kr90.58
DKK kr57.32
NZ$14.13
د.إ29.38
৳956.05
₹680.44
RM36.04
₦12,449.04
₨2,225.65
฿276.37
₺280.60
B$50.48
R145.84
Лв15.01
₩11,594.57
₪28.95
₱472.05
¥1,256.36
MX$163.08
QR29.49
P109.21
KSh1,034
E£407.19
ብር1,010.95
Kz7,321.77
CLP$7,920.79
CN¥58.37
RD$486.71
DA1,073.50
FJ$18.54
Q61.62
GY$1,672.55
ISK kr1,110.08
DH79.89
L147.39
ден472.74
MOP$64.03
N$144.40
C$294.42
रु1,087.79
S/29.89
K32.40
SAR30.05
ZK221.40
L38.22
Kč193.04
Ft3,184.98
SEK kr88.30
ARS$8,177.88
Bs55.27
COP$35,049.67
₡4,026.48
L203.08
₲62,451.77
$U356.70
zł32.75

What's Included

File Size: 654K
Pages: 36
Language: English
Level: All levels
Skills: Data Cleaning, Command-line, Bash, Scripting, Data Manipulation
Age groups: All ages

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.