You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TL;DR: an optional slave mode should be available (with a command line switch), in order to use the standard output of the backend as a specific dedicated communication channel between the backend's process ad the one which spawned it. This would fix synchronization issues between the two, and initial network setup (like for instance knowing the port number on which the backend managed to run)
Read details below to know more, especially why we decided to use the standard output and not another file stream.
Introduction
Current backend behavior
For now the backend behaves the same no matter how it was run.
It is completely agnostic of who is going to use it and when — therefore any client should not connect to the server before it is fully ready.
Also it considers that its standard output is in the end associated to a terminal user interface, and it uses it solely for user interaction purposes (logging). For that it uses the standard console module.
Note that third-party modules used by the backend may behave the same way and we have no control over them, nor can we predict that!! They might even directly use process.stdout standard output stream.
Additional wanted behavior
Sometimes the server might be run in a closer environment, and launched by another process which would use it exclusively.
In this case, here is what changes:
the server is not expected to be controlled by the user anymore (through the terminal interface), but by the application that launched it
therefore, writing user-intended messages to a terminal doesn't really make sense, we should rather send specific messages tagged as log to the parent application, which would then do what it can/wants with them (like displaying them on its own terminal interface)
the two processes — client and backend — should be able to communicate through a privileged channel, in order to establish proper synchronization before moving on to standard network (HTTP) communication (and also to exchange data purely related to process management, not to backend's services)
This is the purpose of the slave mode.
Activation of the slave mode
The default mode should remain the current one.
To activate the slave mode, I suggest giving a command line option, this is the most simple thing for such use cases (and later on we could add a parameter to the argument if needed for some reasons).
Proposed name: --slave (short -s)
Setup of the communication channel
See section at the bottom for reasons why we have to hijack the standard output in order to use it for our purpose instead of using a custom stream.
As we need to isolate the standard output stream for our purpose, we need to make sure that no one will execute code which directly or indirectly writes to it.
What we can (reasonably) do is:
basic solution: overriding methods of the console module that use process.stdout behind
safer solution: change the reference process.stdout to another stream (preferably process.stderr to choose something similar)
Note that this must be done at the very beginning of the process, so that no one can save the reference to stdout (or simply use it before us).
What we can't (reasonably) do:
process.stdout is just a handy abstraction of the standard output as a Node.js stream. However, the real reference to the standard is associated to the file descriptor 1, this is a known convention (see section at the bottom for information about how file descriptors work).
Therefore as long as any low-level IO method using directly file descriptors is available, we can't prevent anyone from using it. The only way would be to override any of those methods and simply convert all files descriptors equal to 1 to 2 (corresponding to the error stream this time) before calling the actual functions.
Note that removing or changing the entry associated to the file descriptor 1 is not possible since the file descriptors table is something managed by the OS for each process.
Format of the messages
I don't know yet how the messages should be formatted. The only thing I can tell, is that what is written on the stream should be a sequence of JSON objects (I mean the string representation of those) for portability and simplicity.
This way, when the parent application detects data on the stream, it will read it until it gets a full valid JSON representation — probably a number should be sent before to indicate the expected length of the data — and then will parse it to obtain the object wrapping the message.
Afterwards, there should be a property indicating the message type, and maybe other properties only purely related to the message management (size, date, message not finished, whatever), and finally one property containing the value of the message.
A better solution: using a custom file
The idea is to programmatically set up a communication channel (an IPC) between the parent process and the child process. However here are the requirements:
the information about how to use the IPC must be known by the two processes
this IPC must be ready to work for both processes as soon as the child begins running
The problem we face currently is that the network solution, based on sockets, doesn't fulfill the second requirement: the port number is unknown.
Files like the standard output are the best solution, however what we would like to do is using another, dedicated, file. This file would be created by the parent, and passed to the child.
How could such a file be passed to the spawned process? Below is the explanation.
DISCLAIMER: below is my understanding of the subject. The concepts around files and their corresponding structures are quite hazy.
Concepts
What is a file?
A file is an abstraction of a stream of data, that can be read and written. For those two basic operations, no matter what is actually behind: network, file system, memory, etc.
A file is represented by a specific structure in memory. The latter stores among other things the current state of the use of the file: reading offset, access rights, etc. This way two files with different states can be created, even if in the end thay can operate on the same physical data.
Who can create and manipulate files?
Directly, only the OS can, user processes just use OS's APIs to ask for creation, reading, writing and so on.
How are file referenced then?
Beyond what the OS could do for its own purpose, pointers to the file structures mentioned above (this corresponds to opened files) are stored in what we call file descriptors tables.
The OS exclusively manages those tables and creates one table per process.
At process level, the only thing the program can do is to refer to one of those pointers, by giving the index of the latter in the table. This index is called file descriptor (yes, the name is confusing, this is just an index for a pointer to a file description structure)
What it means we can do
First, files management is done at OS level, including the association of file references to processes.
Then, process creation is also done by the OS.
Thus, we can think that we can — and we do so indeed — tell the OS to give to the newly created process an existing reference to a file, putting it at a specified index in the file descriptors table — for instance we could agree on using the index 3, or we could even pass the number on the command line (NB: this could clash with other modules of the child process only if they expect something particular with the chosen file descriptor).
The issue
Unfortunately, not all systems gives a direct (or even indirect) access to this feature:
✓ Node.js: child_process.spawn: an option allows to describe how to configure each file descriptor's entry for the new process
✓ PHP: proc_open's second argument $descriptorSpec allows to do that as well
✘ Java: only Runtime.exec method is available and it doesn't allow much things
✘ Python: I don't see anything in subprocess except classical management of standard file descriptors
...
So to remao, compatible with most processes, the current choice is to use — and hijack, yes — the standard output.
Last thing to mention: named pipes
Pipes follow the file abstraction, but they reside only in memory.
If the reference to the pipe cannot be given at process creation by using the ability of the OS to configure the file descriptors table, there is another solution: using named pipes. This way it acts like files from the file system: name clashing (even if less likely in this case due to the reduced number of instances), access rights and so on must be watched.
The name then can be agreed or passed on the command line, as usual I'd like to say.
The only issue — beyond the name clashing possibility — is the potential complexity of the application of this method, and above all the risk of introducing dependencies with the platform (OS).
The text was updated successfully, but these errors were encountered:
In slave mode, it should be possible for the backend to detect if its master (launcher) process is not running anymore, in which case there should be a possibility to shutdown the backend (self closing).
TL;DR: an optional slave mode should be available (with a command line switch), in order to use the standard output of the backend as a specific dedicated communication channel between the backend's process ad the one which spawned it. This would fix synchronization issues between the two, and initial network setup (like for instance knowing the port number on which the backend managed to run)
Read details below to know more, especially why we decided to use the standard output and not another file stream.
Introduction
Current backend behavior
For now the backend behaves the same no matter how it was run.
It is completely agnostic of who is going to use it and when — therefore any client should not connect to the server before it is fully ready.
Also it considers that its standard output is in the end associated to a terminal user interface, and it uses it solely for user interaction purposes (logging). For that it uses the standard
console
module.Note that third-party modules used by the backend may behave the same way and we have no control over them, nor can we predict that!! They might even directly use
process.stdout
standard output stream.Additional wanted behavior
Sometimes the server might be run in a closer environment, and launched by another process which would use it exclusively.
In this case, here is what changes:
This is the purpose of the slave mode.
Activation of the slave mode
The default mode should remain the current one.
To activate the slave mode, I suggest giving a command line option, this is the most simple thing for such use cases (and later on we could add a parameter to the argument if needed for some reasons).
Proposed name:
--slave
(short-s
)Setup of the communication channel
See section at the bottom for reasons why we have to hijack the standard output in order to use it for our purpose instead of using a custom stream.
As we need to isolate the standard output stream for our purpose, we need to make sure that no one will execute code which directly or indirectly writes to it.
What we can (reasonably) do is:
console
module that useprocess.stdout
behindprocess.stdout
to another stream (preferablyprocess.stderr
to choose something similar)Note that this must be done at the very beginning of the process, so that no one can save the reference to stdout (or simply use it before us).
What we can't (reasonably) do:
process.stdout
is just a handy abstraction of the standard output as a Node.js stream. However, the real reference to the standard is associated to the file descriptor 1, this is a known convention (see section at the bottom for information about how file descriptors work).Therefore as long as any low-level IO method using directly file descriptors is available, we can't prevent anyone from using it. The only way would be to override any of those methods and simply convert all files descriptors equal to 1 to 2 (corresponding to the error stream this time) before calling the actual functions.
Note that removing or changing the entry associated to the file descriptor 1 is not possible since the file descriptors table is something managed by the OS for each process.
Format of the messages
I don't know yet how the messages should be formatted. The only thing I can tell, is that what is written on the stream should be a sequence of JSON objects (I mean the string representation of those) for portability and simplicity.
This way, when the parent application detects data on the stream, it will read it until it gets a full valid JSON representation — probably a number should be sent before to indicate the expected length of the data — and then will parse it to obtain the object wrapping the message.
Afterwards, there should be a property indicating the message type, and maybe other properties only purely related to the message management (size, date, message not finished, whatever), and finally one property containing the value of the message.
A better solution: using a custom file
The idea is to programmatically set up a communication channel (an IPC) between the parent process and the child process. However here are the requirements:
The problem we face currently is that the network solution, based on sockets, doesn't fulfill the second requirement: the port number is unknown.
Files like the standard output are the best solution, however what we would like to do is using another, dedicated, file. This file would be created by the parent, and passed to the child.
How could such a file be passed to the spawned process? Below is the explanation.
A reminder about file descriptors
References:
DISCLAIMER: below is my understanding of the subject. The concepts around files and their corresponding structures are quite hazy.
Concepts
A file is an abstraction of a stream of data, that can be read and written. For those two basic operations, no matter what is actually behind: network, file system, memory, etc.
A file is represented by a specific structure in memory. The latter stores among other things the current state of the use of the file: reading offset, access rights, etc. This way two files with different states can be created, even if in the end thay can operate on the same physical data.
Directly, only the OS can, user processes just use OS's APIs to ask for creation, reading, writing and so on.
Beyond what the OS could do for its own purpose, pointers to the file structures mentioned above (this corresponds to opened files) are stored in what we call file descriptors tables.
The OS exclusively manages those tables and creates one table per process.
At process level, the only thing the program can do is to refer to one of those pointers, by giving the index of the latter in the table. This index is called file descriptor (yes, the name is confusing, this is just an index for a pointer to a file description structure)
What it means we can do
First, files management is done at OS level, including the association of file references to processes.
Then, process creation is also done by the OS.
Thus, we can think that we can — and we do so indeed — tell the OS to give to the newly created process an existing reference to a file, putting it at a specified index in the file descriptors table — for instance we could agree on using the index 3, or we could even pass the number on the command line (NB: this could clash with other modules of the child process only if they expect something particular with the chosen file descriptor).
The issue
Unfortunately, not all systems gives a direct (or even indirect) access to this feature:
child_process.spawn
: an option allows to describe how to configure each file descriptor's entry for the new process$descriptorSpec
allows to do that as wellSo to remao, compatible with most processes, the current choice is to use — and hijack, yes — the standard output.
Last thing to mention: named pipes
Pipes follow the file abstraction, but they reside only in memory.
If the reference to the pipe cannot be given at process creation by using the ability of the OS to configure the file descriptors table, there is another solution: using named pipes. This way it acts like files from the file system: name clashing (even if less likely in this case due to the reduced number of instances), access rights and so on must be watched.
The name then can be agreed or passed on the command line, as usual I'd like to say.
The only issue — beyond the name clashing possibility — is the potential complexity of the application of this method, and above all the risk of introducing dependencies with the platform (OS).
The text was updated successfully, but these errors were encountered: