They were clearly betting on the fact that no one would notice they are there. What scares me is we're just finding this out. How long have criminal organizations and rogue nations known about this and what have they used it for?
Perhaps simply printing a single yellow dot through a few different printers would be enough to accomplish the same thing. Then using the resulting paper for "real" prints.
The more I think about it, this could even be a service. "Preprinted" paper that went though a bunch of printers, each adding their own unique identifier each time, then sold and distributed.
That or just paying cash for a printer.
Edit: It looks like the Purdue lab was only publishing research from 2003 to 2010, or hasn't updated its web site.
That makes me think that it may have been a mistake to create this list in the first place, because the main practical use of the list would be to help people buy color laser printers that don't do forensic tracking, yet it's not clear that any such printers are actually commercially available.
To clarify, other here refers to anything other than a color laser?
Also, are color LED printers included as color laser? (I would think so).
https://en.wikipedia.org/wiki/Printer_steganography
There were press articles about it by 2004 (and I think some earlier), we had written the tool that Rob Graham used to decode these scans by 2005, and I gave a number of TV interviews about it during 2005. A small number of manufacturers (maybe worried about European data protection laws) also alluded to the existence of the technology in their user manuals. Some of the people from industry who contacted me also said that this was common knowledge to people in the printing industry since at least the turn of the millennium.
It would be good to start with
https://engineering.purdue.edu/~prints/publications.shtml
and then see who's cited them in the last few years.
Buy a printer hundreds of miles away from home while on a road trip, pay cash, and then do whatever you want with it: print yourself a hundred million dollars and enjoy your print-irement. :)
Simply the awareness about the possibility of tracking goes a long way.
The person who wrote the list in the article mentions (at https://news.ycombinator.com/item?id=14502425, in this thread) that there is a second generation of this technology that doesn't produce microscopically-visible dots.
Perhaps the use of all available colors is involved.
Buying a printer second-hand with cash might cover this situation.
Burner printers...
- Office IT maintenance all hate printers
- I'm sure you've gaped at the control software that came with the little $40 inkjet you bought (or had to use) at some point
- Even the creator of MINIX cited "buggy printer drivers" as his rationale behind prferring nanokernel architecture (all drivers run in userspace) instead of monolithic approach (all drivers run in ring 0; printer driver sits next to crypto keyring).
So. Printers are terrible.
Remember Brinks' fireproof safes that were absolutely rock-solid but were running fully unpatched WinXP and had a USB port on the side of the keypad for "security updates"? Hackable with a keyboard stuffer that looked like a flash drive. https://news.ycombinator.com/item?id=9961024
I remember reading (somewhere) a conclusion that went along the lines of, Brinks are awesome at making safes - and this safe was truly amazing - but they weren't a software company, to a profound extent.
It's clear to me that printer companies are similarly really, really bad at software design too.
So, reverse-engineering the firmware to figure out what things printers are doing use probably wouldn't be all too difficult.
The printer companies were told to write the code, not write it perfectly and make it impossible to unravel as well.
Of course, if anyone actually take a crack at this (excuse the pun) that'll make things change a bit, but printer firmware is probably at the "open sesame in a big way" stage right now, and the printer industry is huge and slow to change, which suggests reverse engineering could remain trivial for a little while, even with publishing.
The hand-waving part of this is "if you can determine a portion of this randomness by inspecting the output of the algorithm", because I don't really understand how easy this could be made without knowing the exact underlying signal that the dithering algorithm needs to quantize.
An alternative might be slightly changing some of the values in the matrices at
https://en.wikipedia.org/wiki/Ordered_dithering
in a way that barely reduces perceptual image quality (although I'm not certain how well that can be done). Perhaps there is an algorithm that uses statistics to deduce what matrix was used, and then the perturbations can be read out of the matrix.
This is related to research in digital watermarking that's been going on for decades, and I'm definitely not an expert in that or in digital image processing, so I'd love to hear from people who know more.
Nonetheless, looking up close at how printers produce different colors out of CMYK dots, I'm pretty confident that they have some degrees of freedom, and that some of them probably don't make a lot of different perceptually, and can probably be used to encode a message.
In the future maybe each pack of paper will have a steganographic tracking code - slight white variations for example, so you could track the paper to the selling shop.
The camera, the scanner could also do this.
And as more and more systems get integrated, they could see that your car and your phone were present at the time in the shop where the paper was sold.
A second hand printer can still be tracked if you buy it through craig list, even through a long chain - the original printer was sold in a shop, the CCTV and facial recognition and phone location identified the original buyer, the buyer Craiglist account sold such a printer second hand on some date, he was tracked to some location, you were also tracked to same location at same time, you were pictured carrying a big package, your high-resolution electricity metering suddenly shows that you have a laser printer running in your house, and that you printed exactly 53 pages on the day of the leak, ...
As you can see, it will become harder and harder to do anything anonymously.
CSI will look naive compared to what will be possible in the future. Infinite zoom and camera viewpoint rotation will be trivial if you'll have one little camera in every door, every street sign, every corner, every car, every little thing.
The utility to law enforcement is being able to prove a connection between the evidence and a suspect after they've obtained a warrant. Or in this case, it was a NSA owned printer so they already had the serial number without needing a warrant.
And if you are foolish enough to register with the manufacturer, the NSA already has your serial number without needing a warrant.
Before today, what was the most likely path to this knowledge? As in one month ago...and how many 26 year olds have occasion to learn themselves the details of printers? Nobody uses printers.
Yes, working in that position it would be more likely, but she still could merely be a corner case when it comes to laser printer dot awareness, even within the IC.
I was surprised to see printers being involved--I thought something like this leak would all be digital. I was also surprised that the Intercept would not be more savvy about printer identification because it's been publicized so much over the years.
I wonder how many million extra gallons of yellow toner and ink are wasted every year printing these tracker dots?
Generally, anything that less than half of the population knows abut is a secret (e.g., menstruation is still called a "secret" in some circles...), so you shouldn't be confused, just disappointed at how gullible / uninformed the average person is.
(It's trivial to deal with the audio sync issues.)
Cams may have a lot of spatial unreliability, but they have a lot of temporal resolution.
And that's just my stupid of-the-cuff answer, which is already off to a decent start. And there are in fact purely-spatial solutions that do work, to which the temporal solutions can be added. The upshot is don't expect to beat these anytime soon. There's just too many bits to hide in, and so few bits needed for the identification.
For a real example that really works, see, for example, digimarc:
https://www.digimarc.com/support/product/digimarc-guardian-f...
Images can be cropped, rotated, recompressed, scaled, etc. and the digital watermark remains.
Also see: https://en.wikipedia.org/wiki/Digimarc
and read some of their patents, referenced in the Wikipedia article.
Heh. The tagline for this car HUD (http://www.jbl.com/connected-car/CP100+LEGEND.html) says, "Now your car can be on the grid too". That's getting pretty close to your tagline.
But a back-of-the-envelope calculation suggested to me an upper limit of about 1000 kg to 10000 kg of extra toner per year in the United States. However, there are several factors that make me think that even the low figure is an overestimate.
I agree with your frustration about paying for this. Mako Hill used it as an example of an antifeature
https://wiki.mako.cc/Antifeatures
(It might be more accurate to define antifeatures in terms of buyers' willingness to pay to have the features removed, rather than sellers' insistence on being paid to remove them, since we can't, in fact, routinely pay for many of the antifeatures he mentions to be removed.)
I would say about 20% of the files we sent over had enough recoverable watermark to be useful.
Unfortunately, it might not even be necessary to mark the sheets of paper in order to be able to track them: https://citp.princeton.edu/research/paper/
Right.
> Also, are color LED printers included as color laser? (I would think so).
Yes. We could also perhaps say "color printers other than inkjet or dot matrix".
Also color photocopiers (using the same codes).
So it's nice that you print stuff for your Kid, but what about those who print for societies sake?
I would start, but I'm currently not around a printer...
Also, many things that are legal now, are only so because people have been blazing the trail by breaking laws before. Think marijuana, homosexuality, ... heck even things like religious freedom, freedom of speach, democracy a few hundred years ago.
Probably meant they didn't actually have the resources to monitor all calls, so they were requesting people to not plot rebellions on the phone.
Now we're richer than that.
I heard that quote in a Snowden interview though perhaps he was quoting someone else: it's not because one has nothing controversial to say that one shouldn't support free speech.
The best way of ensuring this isn't to test it, but to simply ask the producer whether they would do this. If they say "no" or refuse to answer then don't buy that product. Not even if you had a printer from a manufacturer that had public firmware and driver code could you be sure by just inspecting the code and the printed output.
If they clearly say they don't watermark output then you probably have to trust them or simply not use printers.
(I fully agree with your point; however, I'd argue that the (relative, back then) lack of digital storage and communication made gathering of information much, much harder back then than it is now - even for the Stasi.)
[1]-https://www.privateinternetaccess.com/blog/2016/09/police-ro...
My previous printer did this, and my current Canon refuses to boot when any cartridge is missing or empty.
I agree. Some of the off the top of my head ideas that I literally just came up with now:
- if printing an image, drop a few dots in some rows (or columns); data is hidden in the pattern of dropped dots
- if printing text (as in, actual text goes to be rendered on the driver or printer firmware level, and not by the OS / text editor), slightly alter the shape of some letters (by adding or dropping a dot) to hide a pattern
- if printing an image, try to hide some data in its FFT (e.g. by adjusting differences between low frequencies and hiding a pattern there)
- if recording a video, slightly alter some otherwise stable global characteristic (like avg brightness of a bunch of consecutive frames in an animated movie)
- if recording a video, screw with timing patterns, as you mentioned
There are just so many properties, that the difficulty is probably mostly in picking something that's stable through usual transformations a document will undergo (e.g. scanning, JPG compression).
Print the same page, compare the signals sent to the motors? Won't that be a more easily/accurately measure proxy for what's actually being printed. One might need the timing data for the jets on an inkjet too, etc.
https://www.nytimes.com/2017/06/06/us/politics/reality-leigh...
And none of those would impact the timing of black interim time lengths.
Also, this makes digimarc sound crappy (from their site):
>Facebook compresses images once they are posted, sometimes heavily, which can damage our invisible identifiers. Fortunately, there is a simple solution: if you pre-compress your images, then apply our identified, they should survive.
So they don't survive compression.
But it does not redirect anymore, I had to use archive.org
So for an inkjet you'd have to look at the nozzle timing, which might be difficult depending on how integrated the drivers are (e.g. if they're a custom chip on a flexprint behind the heads... uhm...). For a laser printer you'd have to look at the laser modulation signal. That should be much easier, bugs have done that before.
Reverse engineering the firmware might be easier... on the other hand, the firmware is probably bolted shut rather well — the printer manufacturers cartridge DRM is in there somewhere.
Isn't that how the caught Chelsea Manning? Serial numbers from CD-RWs. Also, the store itself doesn't need to associate it with the customer, they just need to know where those CDs were distributed, and the investigators can follow up the transaction details.
It is incredible how fickle printers are, all the hassle they give, printing problems, network connection problems, special drivers to install even on modern operating systems, paper jams (one would have guessed by now they should have at least solved the paper jams) even on our quite expensive printers.
It's like everything but the print quality got stuck in the year 2000 and never again evolved.
Fun-fact: The Stasi installed Caesium-based gamma ray scanners in some border checkpoints. To this day no one knows for sure how strong the radiation exposure was.
See the reply to SomeStupidPoint by schoen a few hours ago (and a couple of other posts on this thread) for more detail.
It would likely make identifying tracking marks and algorithms a lot easier.
However, I - and evidently many others in this thread - can think of many B&W ways to hide data in a printout.
By the way, if someone wants to take a stab at an older printer's firmware — many Kyocera printers from the late 90s and early 2000s used some small PowerPC with the firmware on a mask ROM on a SIMM-like module. Doubtful that there is anything protected there.
You could even make a printer with open source hardware - something like this, but higher resolution: https://www.youtube.com/watch?v=zX09WnGU6ZY, or think http://reprap.org/ - home made 3d printers made only from commonly available and 3d printed parts.
"Secret Dots from Printer Outed NSA Leaker"
Can be had for free from business throwing them out, ribbons are readily available, plus they can be used as generic printers (limited graphics capability, but supported both on windows and CUPS) or by writing directly to /dev/lp0
I'm not sure if dot matrix was in the study (I want to say it was inkjet?) but the principle remains the same.
Not as precise as embedded serial numbers with watermarking, but it could get a whistle blower identified.
More annoying are the privacy concessions that are the result of secret anti-counterfeiting measures (which is what I assume the measures are for).
So they needed to make sure all the people expected to be under surveillance all the time, to keep them from doing anything undesirable (to the state, that is) while not being watched.)
IIRC the Stasi had a hard time connecting the dots (pun not intended) as the massive data sets mostly existed on file cards.
Today's problem is somewhat different: You've got loads of data, you've got the means to rapidly search and index it - but still, for some reason or the other, massive data collection doesn't appear to lead to much by way of desirable (to the populace, that is) results - actual terrorists apprehended, actual conspiracies unearthed, etc.
A cynic would assume that means the data is collected for other, more nefarious purposes. Cough.
Reversing the firmware though, good call.
- already suspect you, or
- can trace the serial back to the purchase.
Conclusion: buy your printer second hand and don't get caught.
There's a lot of work that would go into designing, implementing, and testing this. Then you've got logistics in manufacturing. That's time and money.
But what I'm seeing in the discussions here is a ton of uncertainty about what models even use tracking methods and what methods they use. So I'm guessing this is mostly because we don't know what the software in these printers is doing.
I'm not sure what would be possible for pure vector graphics.
https://duckduckgo.com/?q=docucolor+"not+visible+under+norma...
We originally found that in German,
https://duckduckgo.com/?q=docucolor+"unter+normalen+bedingun...
However, I don't think that most printers currently disclose this, at least for sales in the U.S.
The GSA Schedules: https://www.gsa.gov/portal/content/197989
One way I can think of, is to record data on the CMYK pins on the inkjet head itself. IIRC, they activate between 17v and 22v, and pulse per high.
The goal here is to make the printer think its printing, while recording all the data of the pulse operations. We would get a lengthy file out.
Ideally, the pulse coding should be consistent if printing the same image. "Printing" the same thing over multiple times could show time/date codes embedded.
I should also be able to compare underlying system internals too, with multiple clones of VMs with small config details different. They should be the same data. If they aren't, we know its encoding system stuffs.
But yeah, there is a way to attack this, and that's by going lower in the stack and treating the printers as a black box. It's not the best way, but a way I've thought of that could at least detect this new technique.
Printing in different inks also wouldn't show us a way to diff 2 printed images. Whereas, saving the pulses from the CMYK pins would do that.
When you have a datalog of lots of pulses that represent a picture, you can back-calculate it into an image. You can also diff it without relying on losing data from scanning (or paying attention to the wrong thing). And with enough samples, we can recalculate the algorithm. With the knowledge of what they're doing, we can then start scanning other images for this... But only once we know what they're doing.
Also to convince anyone that it works, you would need to test it out on an extremely large number of printers, including ones of the same model. In practice that could be expensive.
There were magazine articles, newspaper articles, and news site discussions about this years ago. They covered it being added to stop color laser printers and dye sublimation printers from being used for currency counterfeiting. That the tech community has this short of a communal memory astounds and saddens me.
Even beyond the public knowledge of this tactic, that Reality Winner was working at an intelligence agency and was silly enough to think said intelligence agency couldn't track what had been printed in its own offices is laughable. Either she had no business working in that environment as she clearly doesn't understand their mission and methods or she's a scapegoat.
* 2014 - PC World - http://www.pcworld.com/article/229647/counterfeit_money_on_c...
* 2004 - PC World - http://www.pcworld.com/article/118664/article.html
* 2005 - Washington Post, stating it had been in use at least ten years, and that at least one version of the yellow dot code had been broken. - http://www.washingtonpost.com/wp-dyn/content/article/2005/10...
* 2004 - Slashdot - https://hardware.slashdot.org/story/04/02/06/1513255/hp-disc...
* 2004 - Geek.com - https://www.geek.com/news/color-laser-printers-allow-feds-to...
I could probably easily find more.
I think it's possible to send saturated pixels using PCL, and tell the printer to disable half-toning. It requires that a full page fits in memory, which isn't much (512MB) but typically more than the default.
For some reasons all printers use really vintage memory, so 512MB extra memory is crazy-expensive.
Does bunnie do any non-cool projects?? He inspires me more than any other developer/researcher today.
However, the governments did not succeed in limiting their technology to use in counterfeiting investigations, and may not even have attempted to do so.
(Maybe https://en.wikipedia.org/wiki/Invention_Secrecy_Act if they filed for a patent.)
But I would imagine that a simple 3d printed harness would work a lot better with allowing signals to be recorded and make the printer think its printing. Then the bypass harness could have an ARM on it and spool instructions to either a SD card, or via USB serial.
The goal here is as transparent as possible, just in case there are other security systems that try to detect this attack. But I'd guess they havent got to that point yet.
If you were paranoid enough to pay cash for a printer somewhere far from home because of the tracking issue, you will probably block it from sending messages outside of your LAN.
The "yellow dot" method would be picked up pretty quickly by the yellow being triggered while printing entirely B&W documents.
Things like dithering, if they encode things like printer serial numbers might be catchable by printing an identical set of documents across two examples of the printer.
That's what my shim would do, is record the signals to the ink solenoids. The reason to make the printer think its printing, is primarily because of all the DRM lockout crap all manufacturers use. Ideally, I'd even let it print so that when the firmware sees ink levels going down, nothing on the firmware side would look amiss.
> The "yellow dot" method would be picked up pretty quickly by the yellow being triggered while printing entirely B&W documents.
Indeed. However, its old hat about the yellow dots. I know they've moved on to something much harder to detect, and also likely scan and reprint resilient. What is this new type? No bloody clue. I'd assume a bad actor using heavy stego on chip. And if I were designing it, I'd watch for things like test images coming through and not mark them.
My first attempt would be with a high res 100$ bill scan. I betcha that'd trigger something interesting.
> Things like dithering, if they encode things like printer serial numbers might be catchable by printing an identical set of documents across two examples of the printer.
Yeah, I figure there's a serial number, time/date, hostname, IP, logged-in username.. All sorts of data. This is also corporate espionage area as well as national security, so I'd figure they would put out all the stops to catch, if they can't prevent the print itself.
Just chalk it up to me, and my paranoid mind. Still doesnt mean they arent out to get you!
I wonder if printing a blank page and then double printing the document on this page would distroy the pattern
It seems that there is an opportunity here for creating a program able to print a random layer of light yellow points to a blank page.
You'd think so. After all, Stallman created FSF in part because of his frustrations with a printer!
> In 1980, Stallman and some other hackers at the AI Lab were refused access to the source code for the software of a newly installed laser printer, the Xerox 9700. Stallman had modified the software for the Lab's previous laser printer (the XGP, Xerographic Printer), so it electronically messaged a user when the person's job was printed, and would message all logged-in users waiting for print jobs if the printer was jammed. Not being able to add these features to the new printer was a major inconvenience, as the printer was on a different floor from most of the users. This experience convinced Stallman of people's need to be able to freely modify the software they use.