With recent high-profile security decryption cases, encryption is more important than ever. Much of your browser usage and your smartphone data is encrypted. But what does that process actually entail? And when computers get smarter and faster due to advances in quantum physics, how will encryption keep up? This video can be played during a lesson on identifying common methods of securing data.
We’re going to look at how our computers read and interpret computer files. We’ll talk about how some popular file formats like txt, wave, and bitmap are encoded and decoded giving us pretty pictures and lifelike recordings from just strings of 1’s and 0’s, and we’ll discuss how our computers are able to keep all this data organized and readily accessible to users. You’ll notice in this episode that we’re starting to talk more about computer users, not programmers, foreshadowing where the series will be going in a few episodes.
Often files are way too large to be easily stored on hard drives or transferred over the Internet - the solution, unsurprisingly, is to make them smaller. Today, we’re going to talk about lossless compression, which will give you the exact same thing when reassembled, as well as lossy compression, which uses the limitations of human perception to remove less important data. From listening to music and sharing photos, to talking on the phone and even streaming this video right now the ways we use the Internet and our computing devices just wouldn’t be possible without the help of compression.
This video will go a little meta and talk about how computer science can support learning with educational technology. We here at Crash Course are big fans of interactive in-class learning and hands-on experiences, but we also believe in the additive power of educational technology inside and outside the classroom from the Internet itself.
Unencrypted communication over the Internet works a lot like sending a postcard: it can be read by anybody along the delivery route. Communication is routed through intermediary computers and systems, which are connected to many more computers and systems. Encryption, or encoding information so it appears scrambled to anyone who doesn’t know the key, is a way to wrap a postcard in an envelope. While it can never be 100% secure, stronger encryption makes it harder for people to get to the contents.
The lesson elements in this module teach students about the privacy principle “Communication over a network, unless strongly encrypted, is never just between two parties”. They are designed to be independent and flexible, so you can incorporate them into any size lesson plan. Student resources are available at https://teachingprivacy.org/someone-could-listen/.
Summary of Learning Objectives: Students can articulate how the multi-step, multi-party pathways of networked communication affect users’ privacy; students can identify and use more secure communication options.
Target Age: High school, college undergraduate.
The word "code" has lots of meanings in computer science. It's often used to talk about programming, and a program can be referred to as "source code". Even binary representation of information is sometimes referred to as a code. However, in this chapter (and the next three chapters), the sense of coding that will be used is about clever representations of information that address a practical issue, such as encrypting the data to keep it secret.
In the previous chapter, we looked at using binary representations to store all kinds of data — numbers, text, images and more. But often simple binary representations don't work so well. Sometimes they take up too much space, sometimes small errors in the data can cause big problems, and sometimes we worry that someone else could easily read our messages. Most of the time all three of these things are a problem! The codes that we will look at here overcome all of these problems and are widely used for storing and transmitting important information.
The three main reasons that we use more complex representations of binary data are:
Compression: this reduces the amount of space the data needs (for example, coding an audio file using MP3 compression can reduce the size of an audio file to well under 10% of its original size).
Encryption: this changes the representation of data so that you need to have a "key" to unlock the message (for example, whenever your browser uses "https" instead of "http" to communicate with a website, encryption is being used to make sure that anyone eavesdropping on the connection can't make any sense of the information).
Error Control: this adds extra information to your data so that if there are minor failures in the storage device or transmission, it is possible to detect that the data has been corrupted, and even reconstruct the information (for example, bar codes on products have an extra digit added to them so that if the bar code is scanned incorrectly in a checkout, it makes a warning sound instead of charging you for the wrong product).
Often all three of these are applied to the same data; for example, if you take a photo on a smartphone it is usually compressed using JPEG, stored in the phone's memory with error correction, and uploaded to the web through a wireless connection using an encryption protocol to prevent other people nearby getting a copy of the photo.
Without these forms of coding, digital devices would be very slow, have limited capacity, be unreliable, and be unable to keep your information private.
Data compression reduces the amount of space needed to store files. If you can halve the size of a file, you can store twice as many files for the same cost, or you can download the files twice as fast (and at half the cost if you're paying for the download). Even though disks are getting bigger and high bandwidth is becoming common, it's nice to get even more value by working with smaller, compressed files. For large data warehouses, like those kept by Google and Facebook, halving the amount of space taken can represent a massive reduction in the space and computing required, and consequently big savings in power consumption and cooling, and a huge reduction in the impact on the environment.
Common forms of compression that are currently in use include JPEG (used for photos), MP3 (used for audio), MPEG (used for videos including DVDs), and ZIP (for many kinds of data). For example, the JPEG method reduces photos to a tenth or smaller of their original size, which means that a camera can store 10 times as many photos, and images on the web can be downloaded 10 times faster.
So what's the catch? Well, there can be an issue with the quality of the data – for example, a highly compressed JPEG image doesn't look as sharp as an image that hasn't been compressed. Also, it takes processing time to compress and decompress the data. In most cases, the tradeoff is worth it, but not always.
This video shows an entertaining way to introduce Computer Science to students. It also offers an explanation of a public encryption key in a way that students can easily grasp. Students find out things they thought were safe on the internet are not safe.
In this lesson, students will conduct a small amount of research to explore a file format either currently in use or from history. Students will conduct research in order to complete a "one-pager" that summarizes their findings. They will also design a computational artifact (video, audio, graphic, etc.) that succinctly summarizes the advantages of their format over other similar ones.
This lesson is intended to be a quick, short version of a performance task in which students rapidly do some research and respond in writing. It might take two class days but should not take more. The goal is to develop skills that students will use when they complete the actual Explore PT later in the year.
Students will be able to:- identify reliable sources of information when doing research.- synthesize information taken from multiple online sources.- create an artifact (video, image, slide, poster, etc.) to communicate information about a computing topic.
Note: You will need to create a free account on code.org before you can view this resource.
At some point, we reach a physical limit of how fast we can send bits and if we want to send a large amount of information faster, we have to find a way to represent the same information with fewer bits - we must compress the data.
In this lesson, students will use the Text Compression Widget to compress segments of English text by looking for patterns and substituting symbols for larger patterns of text. After some experimentation students are asked to come up with a process (or algorithm) for arriving at a "good" amount of compression despite the fact that there is no way to know what is best or optimal. In developing a so-called "heuristic approach" to this problem, students will grapple with the tradeoffs in compressing data and begin to develop a sense of computing problems that are “hard” to solve.
This is a big lesson that covers a lot of bases. It should easily take 2 or more days of class. First and foremost it covers two or three topics directly from the CSP framework.
1. lossless compression
The basic principle behind compression is to develop a method or protocol for using fewer bits to represent the original information. The way we represent compressed data in this lesson, with a “dictionary” of repeated patterns is similar to the LZW compression scheme, but it should be noted that LZW is slightly different from what students do in this lesson. Students invent their own way here. LZW is used not only for text (zip files) but also with the GIF image file format.
The lesson touches on computationally hard problems and heuristics but please note that computationally hard problems and heuristics will be revisited later on. A general "hand-wavy" understanding is all that's needed from this lesson.
We do want students to see, however, that there is no single correct way to compress text using the method we use in this lesson because a) there is no known algorithm for finding an optimal solution, and b) we don’t even know a way to verify whether a given solution is optimal. There is no way to prove it or derive it beyond trying all possibilities by brute force. This is an example of an algorithm that cannot run in a “reasonable amount of time” - one of the CSP learning objectives.
3. foreshadowing programming behaviors
Lastly, the Text Compression Activity is an important lesson to refer back to when students start programming. The activity engages students in thinking and problem-solving behaviors that foreshadow skills that are particularly useful for programming later down the line. In particular, when students recognize patterns that repeat, and then represent those patterns as abstract symbols, and then further recognize patterns within those patterns, it is very similar to the kinds of abstractions we develop when writing functions and procedures when programming. Decoding the message in the warm-up activity is very similar to tracing a sequence of function calls in a program.
Students will be able to:- collaborate with a peer to find a solution to a text compression problem using the Text Compression Widget (lossless compression scheme).- explain why the optimal amount of compression is impossible or “hard” to identify.- explain some factors that make compression challenging.- develop a strategy (heuristic algorithm) for compressing text.- describe the purpose and rationale for lossless compression.
Note: You will need to create a free account on code.org before you can view this resource.
In this lesson, students will begin to explore the way digital images are encoded in binary.
Students learn the difference between lossy and lossless compression by experimenting with a simple lossy compression widget for compressing text. Students then research three real-world compressed file formats to fill in a research guide. Throughout the process, they review the skills and strategies used to research computer science topics online, in particular, to cope with situations when they don't have the background to fully understand everything they're reading (a common situation even for experienced CS students).
The first goal of this lesson is straightforward: understand what lossy compression is and when/why it might be used. Students should see a number of examples of this distinction throughout the lesson and should leave the lesson being able to describe the relative benefits of each.
The second goal of this lesson is to build up students' research skills both for the project they will complete in the next lesson and for the Explore PT at the end of the year. Students will need practice finding reliable sources, reading technical articles, and synthesizing information. The teacher's role in calling out the skills being used, not merely the facts being found, is significant.
Students will be able to:- explain the difference between lossy and lossless compression.- explain the relative benefits or drawbacks of different file formats, particularly in terms of how they compress information.- identify reliable sources of information when doing research.- explain the difference between open source and licensed software.