Tech

A Unix legend who owes us nothing continues to patch the core AWK code

Increase / Brian Kernighan speaking in tribute to his Bell Labs colleague C programming language co-authored by Dennis Ritchie in 2012. Ritchie’s domino image is behind Kernighan.

A Princeton professor, finding some time for himself during the summer academic lull, wrote a letter to an old friend a couple of months ago. Brian Kernighan said hello, asked how their visit to the US was going, and disembarked hundreds of lines of code it may add Unicode support to AWK, the text parsing tool he helped create for Unix at Bell Labs in 1977.

“I’ve tested it quite a bit, but clearly more testing is needed,” Kernighan wrote in an email published in late May as a pseudo-commitment on the onetrueawk repo from longtime contributor Arnold Robbins. “Once I figure out how … I’ll try to submit a pull request. I wish I had a better understanding of git, but despite your help, I still don’t have a proper understanding, so it might take a while.”

Kernighan is the letter “K”. AWK, a special-purpose language for extracting and manipulating language that was key to Unix’s pipeline features and intersystem interoperability. Working awk function (AWK is a language awk command to invoke it) is critical to both the UNIX Standard Specification and the IEEE POSIX certification for interoperability. There are countless options awk— including modern Unicode-enabled derivatives — but the “One True AWK”, sometimes known as nawkis a sort of canonical version based on Kernighan’s 1985 book AWK programming language and its subsequent input.

Копіі <em>C programming languages</em> at their home campus bookstore, written by Brian Kernighan and Dennis Ritchie (RIP).” src=”https://cdn.arstechnica.net/wp-content/uploads/2009/03/c-programming.jpg” width=” 300″ height=”169″/><figcaption class=

Copies of C programming language at their hometown bookstore, written by Brian Kernighan and Dennis Ritchie (RIP).

Kernighan is also the “K” in “K&R C”, a seminal 1978 book C programming language he co-authored with Dennis Ritchie that follows programmers, in mind and on paper with ears. C’s roots go much deeper. Kernighan taught C to Bell Labs workers and convinced its creator, Richie, collaborate on a book to spread knowledge. This book spawned the “one true style of parentheses,” the endless debates that accompany it, and the structure that underlies every modern programming language.

Kernighan too called Unix and first demonstrated the “Hello, world” code example. He spoke with Ars Technica’s Richard Jensen for a 50 years of Unix history.

The onetrueawk repository, where Kernighan appeared in late May, is a relatively quiet place, with 21 contributors, 46 GitHub followers, and commits every few months. As noted by RegisterKernighan’s Unicode fix became known mainly because it was mentioned in an interview with a professor Computerphile’s YouTube channel.

My favorite video.

“It was always confusing that AWK only worked with ASCII or maybe 8-bit input, but it doesn’t actually handle Unicode at all,” Kernighan tells Professor David Brailsford. “A few months ago I spent some time working with (laughs) an incredibly old program. At the moment I have it where it will actually handle UTF-8 input and output so you can have regular expressions that, you know, pick Japanese characters, things like that.’

Kernighan, now 80, fondly mentions in an interview that he also did something “quick and dirty” to enable AWK to process CSV files.

https://arstechnica.com/?p=1875462

Back to top button