Double door system when copying files between qubes

tokideveloper · May 5, 2024, 12:59pm

Hello,

before editing the documentation page How to copy and move files | Qubes OS by adding a new section, I would like to discuss that section here.

BEGIN

Double door system

Imagine, you want to copy a secret file from a more trusted qube X to a less trusted qube Z. It could be disastrous to copy the wrong secret to qube Z. To avoid such accidents, you could install a double door system. To do so, setup an intermediate qube Y “between” qube X and qube Z. You should copy the secret file from qube X to qube Y and then move it from qube Y to qube Z. You should not copy it from X to Z directly (can be enforced by using the qrexec policy manager). This way, you can check that the file (when being in qube Y) is really the right secret file before moving it to qube Z.

Another use case for a double door system: Imagine you want to copy a file from a less trusted qube X to a more trusted qube Z but sanitize it before (i.e. make it more trusted). If you sanitize the file in qube X then an attacker could replace the sanitized version of that file with a malicious one again before you can copy it to qube Z. If you sanitize the file in qube Z then you qube Z could become compromised if you accidently open it the normal way (not in a disposable qube). So, a solution could be a double door system: Setup an intermediate qube Y “between” qube X and qube Z and allow copying from qube X to qube Y and from qube Y to qube Z but not from qube X to qube Z directly. Then copy your file from qube X to qube Y, open it there in a disposable qube, sanitize the file in that disposable qube and finally move the sanitized file from qube Y to qube Z. In case you have accidently opened the unsanitized version of that file in qube Y then you can throw qube Y away without losing any data in qube X or Z.

END

Of course, the idea applies also for copying text via clipboard.

What do you think? Is it worth to mention the idea of such a double door system? Is there a better solution? Is there a security benefit at all? Or is it even more insecure?

Best regards,
tokideveloper

qubist · May 5, 2024, 3:36pm

What do you think?

You suggest something that goes against the essence of secrecy:

Imagine, you want to copy a secret file from a more trusted qube X to a less trusted qube Z.

and based on that you invent an overcomplicated “safe” procedure to protect that same data.

The other example also contradicts essential principles:

Imagine you want to copy a file from a less trusted qube X to a more trusted qube Z but sanitize it before (i.e. make it more trusted).

If you look at this more carefully, it really means nothing - “more trusted”. How much is “more”? Is Z’s “more” the same as sanitized file’s “more”? How is that evaluated? How is the file actually sanitized and how trusted sanitation is (short/long term)? What is the actual criterion based on which sanitized file’s “more” becomes enough trusted to pass a threshold and enter Z? What valid reasons exist for anyone to even risk to do that, considering there is no practical limit to the number of qubes one can use? Etc.

Is there a better solution?

Keep clean water in one pot, dirty water in another. If you mix them, the result is always dirty water. - Simplest security doc ever

nokke · May 5, 2024, 4:02pm

I like the sanitizing pipeline idea. I’ve seen some posts here talking about doing this sort of thing with emails routinely.

The user error side, I don’t see from your example why you’d draw the line where you do - you’re likely to copy the wrong file, but likely to catch that mistake in an intermediate qube. Why/how?

However, I made some worrying clipboard mistakes early on that I might have caught in an intermediate space, so maybe there’s something that could be added here. Whether it’s new patterns or functionality or user education or what, I couldn’t say. I wouldn’t want to add extra complexity here without clear benefit.

tokideveloper · May 5, 2024, 5:37pm

@qubist wrote:

Keep clean water in one pot, dirty water in another. If you mix them, the result is always dirty water. - Simplest security doc ever

That’s what I want to get: Not mixing the files. Concerning both examples I’ve given, I imagine an investigative journalist who receives secret documents from a source, wants to blacken out sensitive data on these documents and then publish them.

So, in a first step, the journalist wants to read the documents and blacken out sensitive data. So, they could use the command-line tool qvm-convert-pdf. But that tool sends the sanitized documents back to the qube where they came from. (Ideally, it should receive the name of the qube where to store the sanitized files as an argument. Then an extra intermediate qube is not needed). Thus, when executed in a “less trusted” environment, other (unsanitized) files and the sanitized ones are in the same qube, which is bad. Similar when copying the unsanitized files to a qube where sanitized documents belong to. That’s what my second example in my first post is about.

In a second step, the journalist wants to reveal the blackened out documents. Revealing the wrong document could be harmful. So, copying them first in an intermediate qube, they can double-check that they have not copied the wrong files mistakenly. If so, deleting all the files that shall not be published is safe in that intermediate qube (assuming they have been really copied and not moved to that qube). After checking again, it’s relatively safe to reveal the desired documents and it’s more unlikely to reveal any wrong files. That’s what my first example in my first post is about.

tokideveloper · May 5, 2024, 5:40pm

@nokke wrote:

I like the sanitizing pipeline idea. I’ve seen some posts here talking about doing this sort of thing with emails routinely.

Thank you! When other people do such things, it might be worth to instruct newbies?

The user error side, I don’t see from your example why you’d draw the line where you do - you’re likely to copy the wrong file, but likely to catch that mistake in an intermediate qube. Why/how?

It’s not perfect, that’s correct. But an intermediate step could give you time to take a deep breath and take one step back. In that intermediate qube, you can safely delete files. I think, this is also mentally important.

However, I made some worrying clipboard mistakes early on that I might have caught in an intermediate space, so maybe there’s something that could be added here. Whether it’s new patterns or functionality or user education or what, I couldn’t say. I wouldn’t want to add extra complexity here without clear benefit.

Maybe, it could be written on a new “Hardening”, “Paranoid tips” or “Best practices” page in the documentation?

renehoj · May 5, 2024, 5:45pm

You can right-click documents and open them in a disposable vm.

tokideveloper · May 5, 2024, 5:58pm

@renehoj wrote:

You can right-click documents and open them in a disposable vm.

I know. But how does this help concerning my proposal? Not sanitizing the files you want to open often means that you expose yourself to danger every time you open those files.

nokke · May 5, 2024, 6:06pm

Not that I’m a product owner here, but I’d want to see more clearly defined use cases before adding to the documentation load. I can sympathize with the creating breathing space idea but I’m not sure it’s concrete enough. Being able to demonstrate increased user confidence would be a win, though. I think the forum’s a good place for thrashing out whether there actually are cases worth handling.

tokideveloper · May 5, 2024, 9:15pm

@nokke wrote:

Not that I’m a product owner here, but I’d want to see more clearly defined use cases before adding to the documentation load. I can sympathize with the creating breathing space idea but I’m not sure it’s concrete enough. Being able to demonstrate increased user confidence would be a win, though. I think the forum’s a good place for thrashing out whether there actually are cases worth handling.

I see. To the other forum members: What user confidence / security gains do you see in a double door system as described above?

A question to the developers of Qubes OS: Concerning sanitizing files, IMHO the best way would be that a disposable qube will be created in which the conversion happens (after copying the original document to it) and from where the sanitized files will be moved to a dedicated destination qube automatically. Of course, policies should be as strict as possible at any time and the disposable qube should be destroyed after each conversion run. What do you think about it? Is it feasible?

solene · May 5, 2024, 11:25pm

Isn’t exactly what’s happening when you right click on a picture and choose “Convert to trusted img”?

slcoleman · May 6, 2024, 2:31am

It’s never a good idea to try and downgrade complex documents like PDF, but sometimes it is necessary.

My path forward would be to set up a specific template for this process and base a disposable AppVM on that specialized template.

Install something like PDF Studio Pro in that dispVM template /opt. Do not install it in the DispVM/AppVM /rw volume. You want it read only.
Open a terminal in that DispVM.
Right click on that document and copy/move it to that DispVM instance.
Open that document in the DispVM with PDF Studio and black out the text you need to hide, and save it.
qvm-convert the document to a “safer” version.
click on that “safe” file and move it to a secondary target DispVM.
Verify the document is readable in that target VM. Inspect it with a hex editor if necessary.
Erase all prior copies of the document and close the DispVM instances.
Move the document to its final storage VM or storage media and close that secondary AppVM.

deeplow · May 6, 2024, 11:05am

This is on the Dangerzone Qubes-integration roadmap, so keep an eye on it

unman · May 6, 2024, 2:22pm

No - convert will sanitize file in disposable but return file to calling
qube.
Proposal is to sanitize in disposable and then pass to some other qube.

I never presume to speak for the Qubes team.
When I comment in the Forum I speak for myself.

tokideveloper · May 6, 2024, 3:22pm

Thank you all for your replies! Dangerzone sounds very interesting and is exactly what I mean in my second example. Cool stuff!

Concerning my first example, maybe it’s worth to post it to the category “Community Guides” in this forum. And as long as Dangerzone is not there yet, posting the second example to “Community Guides”, too, has a value. Or is there a way to change the category of this thread?

@slcoleman: Thank you for sharing your proposed workflow. IMHO editing a PDF document without sanitizing it BEFORE could be harmful.

qubist · May 6, 2024, 4:21pm

That’s what I want to get: Not mixing the files. Concerning both examples I’ve given, I imagine an investigative journalist who receives secret documents from a source, wants to blacken out sensitive data on these documents and then publish them.

That’s not sanitizing but redacting. Sanitizing, the way you explained it, implies removing something that would be dangerous to the qube it is being sent to. Redacting serves a completely different purpose and a redacted document is just as “dirty” as the non-redacted one.

The safe workflow would be:

Receive the file in networked qube, using a particular identity (‘receiver’)
Open the file in an offline disposable to review it
Redact the file in another offline qube (‘redactor’).
Store the redacted version in offline qube ‘redacted-docs’.

When revealing time comes

Copy the redacted file from ‘redacted-docs’ to offline disposable and view it to double check it is the right one
Copy to ‘publisher’ qube and reveal it.

IOW, the double-door is a multi-door and does not involve copying less trusted data to a more trusted qube.

slcoleman · May 6, 2024, 5:11pm

| tokideveloper
May 6 |

| - |

@slcoleman: Thank you for sharing your proposed workflow. IMHO editing a PDF document without sanitizing it BEFORE could be harmful.

Agreed. Doing anything with a PDF could potentially be harmful.

It’s a little more complicated but possible to build an alternate and safer methodology. Make the document safe first in stage 1, transfer to stage II, split out the image pages, use graphical software (e.g. GIMP) to cut/remove all the “secret” parts of the bitmap images embedded in the document, recompose/rejoin the PDF image pages into one document. If searchable text or “cut and paste” of text is required for publishing or working with the document, then use the PDF editor to OCR(*) the safer ‘image containing document’ and produce an overlay of invisible text over that image that can actually be worked with.

Transfer unsafe document to stage1 DispVM

qvm-convert-pdf PDF Text to a PDF image document;

Transfer the “safe” PDF image document to stage2 DispVM

Split document pages into a series of images

Remove/black-out secret parts from embedded PDF image data,
Rejoin image series of pages into an image PDF

OCR Convert image PDF adding the hidden text overlay

Transfer the recovered document to its final storage VM/media.

Now you can index documents, search text, and even cut and paste the data into news stories, emails, or legal documents, etc. Things that a reporter might need to do but all private data and potential malware has been scrubbed from the document.

This is a process I am currently using for old medical journal articles that come only as a scanned series of images in a PDF wrapper, with no actual text. Once the OCR overlay is produced I can index, cut and paste any text as desired.

tokideveloper · May 6, 2024, 6:52pm

That’s what I want to get: Not mixing the files. Concerning both examples I’ve given, I imagine an investigative journalist who receives secret documents from a source, wants to blacken out sensitive data on these documents and then publish them.

That’s not sanitizing but redacting. Sanitizing, the way you explained it, implies removing something that would be dangerous to the qube it is being sent to. Redacting serves a completely different purpose and a redacted document is just as “dirty” as the non-redacted one.

There is a misunderstanding IMHO. I’ve given and focused on two examples where and how a double-door could help (sanitizing and doublechecking). The story of the investigative journalist is just a scenario where those two examples of double-doors are used. The process of redacting is not part of those two examples but part of the scenario.

The safe workflow would be:

Receive the file in networked qube, using a particular identity (‘receiver’)

Open the file in an offline disposable to review it

Redact the file in another offline qube (‘redactor’).

Store the redacted version in offline qube ‘redacted-docs’.

When revealing time comes

Copy the redacted file from ‘redacted-docs’ to offline disposable and view it to double check it is the right one

Copy to ‘publisher’ qube and reveal it.

I see, your “safe workflow” given here is different from my scenario. You can, of course, consider all qubes to have the same security level. But I think it is more secure if you sanitize the file before step 2. Opening a potentially malicious file could me harmful for the whole system. And: Since a sanitized file is more trustworthy, you can save it to a “more trusted” qube (whatever “more trusted” means to you).

IOW, the double-door is a multi-door and does not involve copying less trusted data to a more trusted qube.

When connecting several double-doors in a row, it looks like a multi-door, of course.

tokideveloper · May 6, 2024, 6:55pm

@slcoleman: Thank you for sharing your proposed workflow. IMHO editing a PDF document without sanitizing it BEFORE could be harmful.

Agreed. Doing anything with a PDF could potentially be harmful.

It’s a little more complicated but possible to build an alternate and safer methodology. Make the document safe first in stage 1, transfer to stage II, split out the image pages, use graphical software (e.g. GIMP) to cut/remove all the “secret” parts of the bitmap images embedded in the document, recompose/rejoin the PDF image pages into one document. If searchable text or “cut and paste” of text is required for publishing or working with the document, then use the PDF editor to OCR(*) the safer ‘image containing document’ and produce an overlay of invisible text over that image that can actually be worked with.

Transfer unsafe document to stage1 DispVM

qvm-convert-pdf PDF Text to a PDF image document;

Transfer the “safe” PDF image document to stage2 DispVM

Split document pages into a series of images

Remove/black-out secret parts from embedded PDF image data,
Rejoin image series of pages into an image PDF

OCR Convert image PDF adding the hidden text overlay

Transfer the recovered document to its final storage VM/media.

Now you can index documents, search text, and even cut and paste the data into news stories, emails, or legal documents, etc. Things that a reporter might need to do but all private data and potential malware has been scrubbed from the document.

This is a process I am currently using for old medical journal articles that come only as a scanned series of images in a PDF wrapper, with no actual text. Once the OCR overlay is produced I can index, cut and paste any text as desired.

This is a good workflow, I think. Thank you for this new version! However, I think this is going off-topic here.

deeplow · May 7, 2024, 9:49am

This is something that we’re considering for Dangerzone as well. But if implemented it will be more mid-term and there are still open questions about the safety of redactions even with rasterization. A lot of this can only be ensured with proper user educarion and when the threat model includes steganographic watermarks (like printer dots) retyping the whole document is still a approach. Adding redaction capabilities can make users over-estimate its power to really redact information.

So there are many many considerations until this can be fully considered as an actually safe alternative.

For those interested in driving that conversation forward, joins us here:

github.com/freedomofpress/dangerzone

Evaluate Dangerzone's Potential as a Redaction Tool (and add redaction capabilities)

opened 08:42PM - 02 Apr 24 UTC

deeplow

Dangerzone's goal is protecting the user against malware. However, thought the w…ay it works, it also removes metadata. So it can also help with publication security. ### The problem Typical PDFs manipulation tools have poorly implemented redaction methods that can be reversed. Because Dangerzone already rasterizes documents, it has nothing to loose. When a black box is applied and then rasterized, there is no more information in the final output. This is best put in the paper _[Story Beyond the Eye: Glyph Positions Break PDF Text Redaction](https://petsymposium.org/2023/files/papers/issue3/popets-2023-0069.pdf)_ (emphasis added): > **Rasterization appears to be an effective defense against deredaction.** In many cases this defense is infeasible be- cause it removes searchable text data from the document, however, performing OCR on the document post-redaction can act as a stop-gap for this issue. Rasterization algorithms may also modify or ignore certain glyph shifts,17 requiring the analyst to perform more reverse engineering to identify the specific rasterization tool used. We're working on turning Dangerzone into a [file view](https://github.com/freedomofpress/dangerzone/issues/424) and that could be the perfect change to add redaction tools. ### User Story As a journalist, I'd like to have use dangerzone to help redact documents, ensuring that redactions cannot be reversed. ### How could this work? User journey: 1. In the [view mode](https://github.com/freedomofpress/dangerzone/issues/424) user draws black squares over blacked out area 2. After all redactions are done, the user saves the final document **Technical explanation:** the host receives all the rasterized images. As the user adds a black box to the image, with the help of an image manipulation module (like [Pillow](https://pypi.org/project/pillow/)) it adds those black boxes to the final image. If we want extra rasterization assurances, we can convert final PDF though dangerzone one more time to ensure proper rasterization. ### Implementation Risks and Unmitigated Risks We should keep in mind that redaction alone may not be to eliminate all unredaction risks. The best advice is never to publish source documents and if needed, to retype them. I can think of several other ways that redaction could still be bypassed: - **invisible watermarks**: if the purpose is to identify the leaker, then printer dots, space-width variations, etc. could all be used. No redaction can save this form of identification. Only document retyping can potentially help there. - **character width** can be [used to reverse redactions](https://www.wired.com/story/redact-pdf-online-privacy/) ([related paper](https://arxiv.org/pdf/2206.02285.pdf)) - **compression artifacts** can [leave traces](https://www.comp.nus.edu.sg/~changec/publications/2008_IH_Residual_Information_Redacted_Image.pdf) of what was hidden. In pre-compressed artifacts like images we cannot help much, as the whole element has to be redacted. However, dangerzone also compresses documents. We could make sure to only do this in the final rasterization (i.e. the one with the redaction boxes).

qvm-convert-pdf if I recall correctly converts the pages into PNGs, which means they are compressed. The blackout part is best done before this compression to ensure no information from the blacked out part is leaked.

In other words, when getting the raw pixel data from each rendered page, that is the perfect opportunity for replacing certain pixels with black.

qubist · May 8, 2024, 6:18pm

There is a misunderstanding IMHO. I’ve given and focused on two examples where and how a double-door could help (sanitizing and doublechecking). The story of the investigative journalist is just a scenario where those two examples of double-doors are used. The process of redacting is not part of those two examples but part of the scenario.

Considering the rest of the discussion, it seems to me the confusion comes from the fact that you use generic terms (copying files between qubes) for actually describing a very specific thing (copying one particular file type, PDF, processing it with a particular rasterizer tool).

Imagine, you want to copy a secret file from a more trusted qube X to a less trusted qube Z.

When I read this, I think of something of ultimate secrecy, e.g. a private key. My first association in regards to sanitizing a received file is “removing malware from a file” (e.g. antivirus cleanup).

After your example about the journalist, the suggested workflow implies any file format (not just PDF) and is not limited to particular tool (qube’s rasterizer). The security model in my workflow is based entirely on compartmentalization and not on a particular tool “sanitizer” which itself can be vulnerable (e.g. through the libs it uses) and perhaps propagate the dirtiness of data to the “sanitized” file.

IIUC, your workflow (second example) suggests:

distrusted input → [less trusted qube X] → “sanitize” → review → [more trusted qube Z]

Personally, I would never do that and I would dislike to see such suggestion in a doc, because in such workflow, everything is as secure as the sanitizer package. The very concept of a data flow in the direction distrusted → trusted is insecure IMO, whatever in-between steps there might be.

You can, of course, consider all qubes to have the same security level.

I don’t think in levels. I think of purpose and data flow - one qube does one thing only and any qube “tells less than it hears”.

Opening a potentially malicious file could me harmful for the whole system.

The rasterizer also opens the file it rasterizes.

And: Since a sanitized file is more trustworthy, you can save it to a “more trusted” qube (whatever “more trusted” means to you).

As per my previous reply, without objective and verifiable measure it means nothing. Any file not created by me from scratch in a clean offline qube is distrusted, so is it’s presence. The only exception might be a short plain-text file.

When connecting several double-doors in a row, it looks like a multi-door, of course.

You can name it multi-single-door, if you will. Or even-more-multi-half-door