1 /*
2 Internal file viewer for the Midnight Commander
3 Function for plain view
4
5 Copyright (C) 1994-2025
6 Free Software Foundation, Inc.
7
8 Written by:
9 Miguel de Icaza, 1994, 1995, 1998
10 Janne Kukonlehto, 1994, 1995
11 Jakub Jelinek, 1995
12 Joseph M. Hinkle, 1996
13 Norbert Warmuth, 1997
14 Pavel Machek, 1998
15 Roland Illig <roland.illig@gmx.de>, 2004, 2005
16 Slava Zanko <slavazanko@google.com>, 2009
17 Andrew Borodin <aborodin@vmail.ru>, 2009-2022
18 Ilia Maslakov <il.smind@gmail.com>, 2009
19 Rewritten almost from scratch by:
20 Egmont Koblinger <egmont@gmail.com>, 2014
21
22 This file is part of the Midnight Commander.
23
24 The Midnight Commander is free software: you can redistribute it
25 and/or modify it under the terms of the GNU General Public License as
26 published by the Free Software Foundation, either version 3 of the License,
27 or (at your option) any later version.
28
29 The Midnight Commander is distributed in the hope that it will be useful,
30 but WITHOUT ANY WARRANTY; without even the implied warranty of
31 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
32 GNU General Public License for more details.
33
34 You should have received a copy of the GNU General Public License
35 along with this program. If not, see <https://www.gnu.org/licenses/>.
36
37 ------------------------------------------------------------------------------------------------
38
39 The viewer is implemented along the following design principles:
40
41 Goals: Always display simple scripts, double wide (CJK), combining accents and spacing marks
42 (often used e.g. in Devanagari) perfectly. Make the arrow keys always work correctly.
43
44 Absolutely non-goal: RTL.
45
46 Terminology:
47
48 - A "paragraph" is the text between two adjacent newline characters. A "line" or "row" is a
49 visual row on the screen. In wrap mode, the viewer formats a paragraph into one or more lines.
50
51 - The Unicode glossary <https://www.unicode.org/glossary/> doesn't seem to have a notion of "base
52 character followed by zero or more combining characters". The closest matches are "Combining
53 Character Sequence" meaning a base character followed by one or more combining characters, or
54 "Grapheme" which seems to exclude non-printable characters such as newline. In this file,
55 "combining character sequence" (or any obvious abbreviation thereof) means a base character
56 followed by zero or more (up to a current limit of 4) combining characters.
57
58 ------------------------------------------------------------------------------------------------
59
60 The parser-formatter is designed to be stateless across paragraphs. This is so that we can walk
61 backwards without having to reparse the whole file (although we still need to reparse and
62 reformat the whole paragraph, but it's a lot better). This principle needs to be changed if we
63 ever get to address tickets 1849/2977, but then we can still store (for efficiency) the parser
64 state at the beginning of the paragraph, and safely walk backwards if we don't cross an escape
65 character.
66
67 The parser-formatter, however, definitely needs to carry a state across lines. Currently this
68 state contains:
69
70 - The logical column (as if we didn't wrap). This is used for handling TAB characters after a
71 wordwrap consistently with less.
72
73 - Whether the last nroff character was bold or underlined. This is used for displaying the
74 ambiguous _\b_ sequence consistently with less.
75
76 - Whether the desired way of displaying a lonely combining accent or spacing mark is to place it
77 over a dotted circle (we do this at the beginning of the paragraph of after a TAB), or to ignore
78 the combining char and show replacement char for the spacing mark (we do this if e.g. too many
79 of these were encountered and hence we don't glue them with their base character).
80
81 - (This state needs to be expanded if e.g. we decide to print verbose replacement characters
82 (e.g. "<U+0080>") and allow these to wrap around lines.)
83
84 The state also contains the file offset, as it doesn't make sense to ever know the state without
85 knowing the corresponding offset.
86
87 The state depends on various settings (viewer width, encoding, nroff mode, charwrap or wordwrap
88 mode (if we'll have that one day) etc.), needs to be recomputed if any of these changes.
89
90 Walking forwards is usually relatively easy both in the file and on the screen. Walking
91 backwards within a paragraph would only be possible in some special cases and even then it would
92 be painful, so we always walk back to the beginning of the paragraph and reparse-reformat from
93 there.
94
95 (Walking back within a line in the file would have at least the following difficulties: handling
96 the parser state; processing invalid UTF-8; processing invalid nroff (e.g. what is "_\bA\bA"?).
97 Walking back on the display: we wouldn't know where to display the last line of a paragraph, or
98 where to display a line if its following line starts with a wide (CJK or Tab) character. Long
99 story short: just forget this approach.)
100
101 Most important variables:
102
103 - dpy_start: Both in unwrap and wrap modes this points to the beginning of the topmost displayed
104 paragraph.
105
106 - dpy_text_column: Only in unwrap mode, an additional horizontal scroll.
107
108 - dpy_paragraph_skip_lines: Only in wrap mode, an additional vertical scroll (the number of
109 lines that are scrolled off at the top from the topmost paragraph).
110
111 - dpy_state_top: Only in wrap mode, the offset and parser-formatter state at the line where
112 displaying the file begins is cached here.
113
114 - dpy_wrap_dirty: If some parameter has changed that makes it necessary to reparse-redisplay the
115 topmost paragraph.
116
117 In wrap mode, the three variables "dpy_start", "dpy_paragraph_skip_lines" and "dpy_state_top"
118 are kept consistent. Think of the first two as the ones describing the position, and the third
119 as a cached value for better performance so that we don't need to wrap the invisible beginning
120 of the topmost paragraph over and over again. The third value needs to be recomputed each time a
121 parameter that influences parsing or displaying the file (e.g. width of screen, encoding, nroff
122 mode) changes, this is signaled by "dpy_wrap_dirty" to force recomputing "dpy_state_top" (and
123 clamp "dpy_paragraph_skip_lines" if necessary).
124
125 ------------------------------------------------------------------------------------------------
126
127 Help integration
128
129 I'm planning to port the help viewer to this codebase.
130
131 Splitting at sections would still happen in the help viewer. It would either copy a section, or
132 set force_max and a similar force_min to limit displaying to one section only.
133
134 Parsing the help format would go next to the nroff parser. The colors, alternate character set,
135 and emitting the version number would go to the "state". (The version number would be
136 implemented by emitting remaining characters of a buffer in the "state" one by one, without
137 advancing in the file position.)
138
139 The active link would be drawn similarly to the search highlight. Other than that, the viewer
140 wouldn't care about links (except for their color). help.c would keep track of which one is
141 highlighted, how to advance to the next/prev on an arrow, how the scroll offset needs to be
142 adjusted when moving, etc.
143
144 Add wrapping at word boundaries to where wrapping at char boundaries happens now.
145 */
146
147 #include <config.h>
148
149 #include "lib/global.h"
150 #include "lib/tty/tty.h"
151 #include "lib/skin.h"
152 #include "lib/util.h" // is_printable()
153 #include "lib/charsets.h"
154
155 #include "src/setup.h" // option_tab_spacing
156
157 #include "internal.h"
158
159 /*** global variables ****************************************************************************/
160
161 /*** file scope macro definitions ****************************************************************/
162
163 /* The Unicode standard recommends that lonely combining characters are printed over a dotted
164 * circle. If the terminal is not UTF-8, this will be replaced by a dot anyway. */
165 #define BASE_CHARACTER_FOR_LONELY_COMBINING 0x25CC // dotted circle
166 #define MAX_COMBINING_CHARS 4 // both slang and ncurses support exactly 4
167
168 /* I think anything other than space (e.g. arrows) just introduce visual clutter without actually
169 * adding value. */
170 #define PARTIAL_CJK_AT_LEFT_MARGIN ' '
171 #define PARTIAL_CJK_AT_RIGHT_MARGIN ' '
172
173 /*
174 * Wrap mode: This is for safety so that jumping to the end of file (which already includes
175 * scrolling back by a page) and then walking backwards is reasonably fast, even if the file is
176 * extremely large and consists of maybe full zeros or something like that. If there's no newline
177 * found within this limit, just start displaying from there and see what happens. We might get
178 * some displaying parameters (most importantly the columns) incorrect, but at least will show the
179 * file without spinning the CPU for ages. When scrolling back to that point, the user might see a
180 * garbled first line (even starting with an invalid partial UTF-8), but then walking back by yet
181 * another line should fix it.
182 *
183 * Unwrap mode: This is not used, we wouldn't be able to do anything reasonable without walking
184 * back a whole paragraph (well, view->data_area.height paragraphs actually).
185 */
186 #define MAX_BACKWARDS_WALK_IN_PARAGRAPH (100 * 1000)
187
188 /*** file scope type declarations ****************************************************************/
189
190 /*** forward declarations (file scope functions) *************************************************/
191
192 /*** file scope variables ************************************************************************/
193
194 /* --------------------------------------------------------------------------------------------- */
195 /*** file scope functions ************************************************************************/
196 /* --------------------------------------------------------------------------------------------- */
197
198 /* TODO: These methods shouldn't be necessary, see ticket 3257 */
199
200 static int
201 mcview_wcwidth (const WView *view, int c)
/* ![[previous]](../icons/n_left.png)
![[next]](../icons/right.png)
![[first]](../icons/n_first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
202 {
203 if (view->utf8)
204 {
205 if (g_unichar_iswide (c))
206 return 2;
207 if (g_unichar_iszerowidth (c))
208 return 0;
209 }
210
211 return 1;
212 }
213
214 /* --------------------------------------------------------------------------------------------- */
215
216 static inline gboolean
217 mcview_ismark (const WView *view, int c)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
218 {
219 return (view->utf8 && g_unichar_ismark (c));
220 }
221
222 /* --------------------------------------------------------------------------------------------- */
223
224 /* actually is_non_spacing_mark_or_enclosing_mark */
225 static gboolean
226 mcview_is_non_spacing_mark (const WView *view, int c)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
227 {
228 if (view->utf8)
229 {
230 const GUnicodeType type = g_unichar_type (c);
231
232 return type == G_UNICODE_NON_SPACING_MARK || type == G_UNICODE_ENCLOSING_MARK;
233 }
234
235 return FALSE;
236 }
237
238 /* --------------------------------------------------------------------------------------------- */
239
240 #if 0
241 static gboolean
242 mcview_is_spacing_mark (const WView *view, int c)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
243 {
244 return (view->utf8 && g_unichar_type (c) == G_UNICODE_SPACING_MARK);
245 }
246 #endif
247
248 /* --------------------------------------------------------------------------------------------- */
249
250 static gboolean
251 mcview_isprint (const WView *view, int c)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
252 {
253 if (!view->utf8)
254 c = convert_from_8bit_to_utf_c ((unsigned char) c, view->converter);
255 return g_unichar_isprint (c);
256 }
257
258 /* --------------------------------------------------------------------------------------------- */
259
260 static int
261 mcview_char_display (const WView *view, int c, char *s)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
262 {
263 if (mc_global.utf8_display)
264 {
265 if (!view->utf8)
266 c = convert_from_8bit_to_utf_c ((unsigned char) c, view->converter);
267 if (!g_unichar_isprint (c))
268 c = '.';
269 return g_unichar_to_utf8 (c, s);
270 }
271 if (view->utf8)
272 {
273 if (g_unichar_iswide (c))
274 {
275 s[0] = s[1] = '.';
276 return 2;
277 }
278 if (g_unichar_iszerowidth (c))
279 return 0;
280 // TODO the is_printable check below will be broken for this
281 c = convert_from_utf_to_current_c (c, view->converter);
282 }
283 else
284 {
285 // TODO the is_printable check below will be broken for this
286 c = convert_to_display_c (c);
287 }
288
289 // TODO this is very-very buggy by design: ticket 3257 comments 0-1
290 if (!is_printable (c))
291 c = '.';
292 *s = c;
293 return 1;
294 }
295
296 /* --------------------------------------------------------------------------------------------- */
297
298 /**
299 * Just for convenience, a common interface in front of mcview_get_utf and mcview_get_byte, so that
300 * the caller doesn't have to care about utf8 vs 8-bit modes.
301 *
302 * Normally: stores c, updates state, returns TRUE.
303 * At EOF: state is unchanged, c is undefined, returns FALSE.
304 *
305 * Just as with mcview_get_utf(), invalid UTF-8 is reported using negative integers.
306 *
307 * Also, temporary hack: handle force_max here.
308 * TODO: move it to lower layers (datasource.c)?
309 */
310 static gboolean
311 mcview_get_next_char (WView *view, mcview_state_machine_t *state, int *c)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
312 {
313 // Pretend EOF if we reached force_max
314 if (view->force_max >= 0 && state->offset >= view->force_max)
315 return FALSE;
316
317 if (view->utf8)
318 {
319 int char_length = 0;
320
321 if (!mcview_get_utf (view, state->offset, c, &char_length))
322 return FALSE;
323 // Pretend EOF if we crossed force_max
324 if (view->force_max >= 0 && state->offset + char_length > view->force_max)
325 return FALSE;
326
327 state->offset += char_length;
328 return TRUE;
329 }
330
331 if (!mcview_get_byte (view, state->offset, c))
332 return FALSE;
333 state->offset++;
334 return TRUE;
335 }
336
337 /* --------------------------------------------------------------------------------------------- */
338 /**
339 * This function parses the next nroff character and gives it to you along with its desired color,
340 * so you never have to care about nroff again.
341 *
342 * The nroff mode does the backspace trick for every single character (Unicode codepoint). At least
343 * that's what the GNU groff 1.22 package produces, and that's what less 458 expects. For
344 * double-wide characters (CJK), still only a single backspace is emitted. For combining accents
345 * and such, the print-backspace-print step is repeated for the base character and then for each
346 * accent separately.
347 *
348 * So, the right place for this layer is after the bytes are interpreted in UTF-8, but before
349 * joining a base character with its combining accents.
350 *
351 * Normally: stores c and color, updates state, returns TRUE.
352 * At EOF: state is unchanged, c and color are undefined, returns FALSE.
353 *
354 * color can be null if the caller doesn't care.
355 */
356 static gboolean
357 mcview_get_next_maybe_nroff_char (WView *view, mcview_state_machine_t *state, int *c, int *color)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
358 {
359 mcview_state_machine_t state_after_three_chars;
360 mcview_state_machine_t state_after_five_chars;
361 int c2, c3, c4, c5;
362
363 if (color != NULL)
364 *color = VIEWER_NORMAL_COLOR;
365
366 if (!view->mode_flags.nroff)
367 return mcview_get_next_char (view, state, c);
368
369 if (!mcview_get_next_char (view, state, c))
370 return FALSE;
371 // Don't allow nroff formatting around CR, LF, TAB or other special chars
372 if (!mcview_isprint (view, *c))
373 return TRUE;
374
375 state_after_three_chars = *state;
376
377 if (!mcview_get_next_char (view, &state_after_three_chars, &c2))
378 return TRUE;
379 if (c2 != '\b')
380 return TRUE;
381
382 if (!mcview_get_next_char (view, &state_after_three_chars, &c3))
383 return TRUE;
384 if (!mcview_isprint (view, c3))
385 return TRUE;
386
387 state_after_five_chars = state_after_three_chars;
388
389 /* Bold and underlined letter x is denoted by: _ \b x \b x */
390 if (*c == '_' && mcview_get_next_char (view, &state_after_five_chars, &c4) && c4 == '\b'
391 && mcview_get_next_char (view, &state_after_five_chars, &c5) && c3 == c5)
392 {
393 *c = c3;
394 *state = state_after_five_chars;
395 if (color != NULL)
396 *color = VIEWER_BOLD_UNDERLINED_COLOR;
397 }
398 else if (*c == '_' && c3 == '_')
399 {
400 *state = state_after_three_chars;
401 if (color != NULL)
402 *color =
403 state->nroff_underscore_is_underlined ? VIEWER_UNDERLINED_COLOR : VIEWER_BOLD_COLOR;
404 }
405 else if (*c == c3)
406 {
407 *state = state_after_three_chars;
408 state->nroff_underscore_is_underlined = FALSE;
409 if (color != NULL)
410 *color = VIEWER_BOLD_COLOR;
411 }
412 else if (*c == '_')
413 {
414 *c = c3;
415 *state = state_after_three_chars;
416 state->nroff_underscore_is_underlined = TRUE;
417 if (color != NULL)
418 *color = VIEWER_UNDERLINED_COLOR;
419 }
420
421 return TRUE;
422 }
423
424 /* --------------------------------------------------------------------------------------------- */
425 /**
426 * Get one base character, along with its combining or spacing mark characters.
427 *
428 * (A spacing mark is a character that extends the base character's width 1 into a combined
429 * character of width 2, yet these two character cells should not be separated. E.g. Devanagari
430 * <U+0939><U+094B>.)
431 *
432 * This method exists mainly for two reasons. One is to be able to tell if we fit on the current
433 * line or need to wrap to the next one. The other is that both slang and ncurses seem to require
434 * that the character and its combining marks are printed in a single call (or is it just a
435 * limitation of mc's wrapper to them?).
436 *
437 * For convenience, this method takes care of converting CR or CR+LF into LF.
438 * TODO this should probably happen later, when displaying the file?
439 *
440 * Normally: stores cs and color, updates state, returns >= 1 (entries in cs).
441 * At EOF: state is unchanged, cs and color are undefined, returns 0.
442 *
443 * @param view ...
444 * @param state the parser-formatter state machine's state, updated
445 * @param cs store the characters here
446 * @param clen the room available in cs (that is, at most clen-1 combining marks are allowed), must
447 * be at least 2
448 * @param color if non-NULL, store the color here, taken from the first codepoint's color
449 * @return the number of entries placed in cs, or 0 on EOF
450 */
451 static int
452 mcview_next_combining_char_sequence (WView *view, mcview_state_machine_t *state, int *cs, int clen,
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
453 int *color)
454 {
455 int i = 1;
456
457 if (!mcview_get_next_maybe_nroff_char (view, state, cs, color))
458 return 0;
459
460 // Process \r and \r\n newlines.
461 if (cs[0] == '\r')
462 {
463 int cnext;
464
465 mcview_state_machine_t state_after_crlf = *state;
466 if (mcview_get_next_maybe_nroff_char (view, &state_after_crlf, &cnext, NULL)
467 && cnext == '\n')
468 *state = state_after_crlf;
469 cs[0] = '\n';
470 return 1;
471 }
472
473 // We don't want combining over non-printable characters. This includes '\n' and '\t' too.
474 if (!mcview_isprint (view, cs[0]))
475 return 1;
476
477 if (mcview_ismark (view, cs[0]))
478 {
479 if (!state->print_lonely_combining)
480 {
481 // First character is combining. Either just return it, ...
482 return 1;
483 }
484 else
485 {
486 // or place this (and subsequent combining ones) over a dotted circle.
487 cs[1] = cs[0];
488 cs[0] = BASE_CHARACTER_FOR_LONELY_COMBINING;
489 i = 2;
490 }
491 }
492
493 if (mcview_wcwidth (view, cs[0]) == 2)
494 {
495 // Don't allow combining or spacing mark for wide characters, is this okay?
496 return 1;
497 }
498
499 /* Look for more combining chars. Either at most clen-1 zero-width combining chars,
500 * or at most 1 spacing mark. Is this logic correct? */
501 for (; i < clen; i++)
502 {
503 mcview_state_machine_t state_after_combining;
504
505 state_after_combining = *state;
506 if (!mcview_get_next_maybe_nroff_char (view, &state_after_combining, &cs[i], NULL))
507 return i;
508 if (!mcview_ismark (view, cs[i]) || !mcview_isprint (view, cs[i]))
509 return i;
510 if (g_unichar_type (cs[i]) == G_UNICODE_SPACING_MARK)
511 {
512 // Only allow as the first combining char. Stop processing in either case.
513 if (i == 1)
514 {
515 *state = state_after_combining;
516 i++;
517 }
518 return i;
519 }
520 *state = state_after_combining;
521 }
522 return i;
523 }
524
525 /* --------------------------------------------------------------------------------------------- */
526 /**
527 * Parse, format and possibly display one visual line of text.
528 *
529 * Formatting starts at the given "state" (which encodes the file offset and parser and formatter's
530 * internal state). In unwrap mode, this should point to the beginning of the paragraph with the
531 * default state, the additional horizontal scrolling is added here. In wrap mode, this should
532 * point to the beginning of the line, with the proper state at that point.
533 *
534 * In wrap mode, if a line ends in a newline, it is consumed, even if it's exactly at the right
535 * edge. In unwrap mode, the whole remaining line, including the newline is consumed. Displaying
536 * the next line should start at "state"'s new value, or if we displayed the bottom line then
537 * state->offset tells the file offset to be shown in the top bar.
538 *
539 * If "row" is offscreen, don't actually display the line but still update "state" and return the
540 * proper value. This is used by mcview_wrap_move_down to advance in the file.
541 *
542 * @param view ...
543 * @param state the parser-formatter state machine's state, updated
544 * @param row print to this row
545 * @param paragraph_ended store TRUE if paragraph ended by newline or EOF, FALSE if wraps to next
546 * line
547 * @param linewidth store the width of the line here
548 * @return the number of rows, that is, 0 if we were already at EOF, otherwise 1
549 */
550 static int
551 mcview_display_line (WView *view, mcview_state_machine_t *state, int row, gboolean *paragraph_ended,
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
552 off_t *linewidth)
553 {
554 const WRect *r = &view->data_area;
555 off_t dpy_text_column = view->mode_flags.wrap ? 0 : view->dpy_text_column;
556 int col = 0;
557 int cs[1 + MAX_COMBINING_CHARS];
558 char str[(1 + MAX_COMBINING_CHARS) * MB_LEN_MAX + 1];
559 int i, j;
560
561 if (paragraph_ended != NULL)
562 *paragraph_ended = TRUE;
563
564 if (!view->mode_flags.wrap && (row < 0 || row >= r->lines) && linewidth == NULL)
565 {
566 /* Optimization: Fast forward to the end of the line, rather than carefully
567 * parsing and then not actually displaying it. */
568 off_t eol;
569 int retval;
570
571 eol = mcview_eol (view, state->offset);
572 retval = (eol > state->offset) ? 1 : 0;
573
574 mcview_state_machine_init (state, eol);
575 return retval;
576 }
577
578 while (TRUE)
579 {
580 int charwidth = 0;
581 mcview_state_machine_t state_saved;
582 int n;
583 int color;
584
585 state_saved = *state;
586 n = mcview_next_combining_char_sequence (view, state, cs, 1 + MAX_COMBINING_CHARS, &color);
587 if (n == 0)
588 {
589 if (linewidth != NULL)
590 *linewidth = col;
591 return (col > 0) ? 1 : 0;
592 }
593
594 if (view->search_start <= state->offset && state->offset < view->search_end)
595 color = VIEWER_SELECTED_COLOR;
596
597 if (cs[0] == '\n')
598 {
599 // New line: reset all formatting state for the next paragraph.
600 mcview_state_machine_init (state, state->offset);
601 if (linewidth != NULL)
602 *linewidth = col;
603 return 1;
604 }
605
606 if (mcview_is_non_spacing_mark (view, cs[0]))
607 {
608 // Lonely combining character. Probably leftover after too many combining chars. Just
609 // ignore.
610 continue;
611 }
612
613 // Nonprintable, or lonely spacing mark
614 if ((!mcview_isprint (view, cs[0]) || mcview_ismark (view, cs[0])) && cs[0] != '\t')
615 cs[0] = '.';
616
617 for (i = 0; i < n; i++)
618 charwidth += mcview_wcwidth (view, cs[i]);
619
620 /* Adjust the width for TAB. It's handled below along with the normal characters,
621 * so that it's wrapped consistently with them, and is painted with the proper
622 * attributes (although currently it can't have a special color). */
623 if (cs[0] == '\t')
624 {
625 charwidth = option_tab_spacing - state->unwrapped_column % option_tab_spacing;
626 state->print_lonely_combining = TRUE;
627 }
628 else
629 state->print_lonely_combining = FALSE;
630
631 /* In wrap mode only: We're done with this row if the character sequence wouldn't fit.
632 * Except if at the first column, because then it wouldn't fit in the next row either.
633 * In this extreme case let the unwrapped code below do its best to display it. */
634 if (view->mode_flags.wrap && (off_t) col + charwidth > dpy_text_column + (off_t) r->cols
635 && col > 0)
636 {
637 *state = state_saved;
638 if (paragraph_ended != NULL)
639 *paragraph_ended = FALSE;
640 if (linewidth != NULL)
641 *linewidth = col;
642 return 1;
643 }
644
645 // Display, unless outside of the viewport.
646 if (row >= 0 && row < r->lines)
647 {
648 if ((off_t) col >= dpy_text_column
649 && (off_t) col + charwidth <= dpy_text_column + (off_t) r->cols)
650 {
651 // The combining character sequence fits entirely in the viewport. Print it.
652 tty_setcolor (color);
653 widget_gotoyx (view, r->y + row, r->x + ((off_t) col - dpy_text_column));
654 if (cs[0] == '\t')
655 {
656 for (i = 0; i < charwidth; i++)
657 tty_print_char (' ');
658 }
659 else
660 {
661 j = 0;
662 for (i = 0; i < n; i++)
663 j += mcview_char_display (view, cs[i], str + j);
664 str[j] = '\0';
665 /* This is probably a bug in our tty layer, but tty_print_string
666 * normalizes the string, whereas tty_printf doesn't. Don't normalize,
667 * since we handle combining characters ourselves correctly, it's
668 * better if they are copy-pasted correctly. Ticket 3255. */
669 tty_printf ("%s", str);
670 }
671 }
672 else if ((off_t) col < dpy_text_column && (off_t) col + charwidth > dpy_text_column)
673 {
674 /* The combining character sequence would cross the left edge of the viewport.
675 * This cannot happen with wrap mode. Print replacement character(s),
676 * or spaces with the correct attributes for partial Tabs. */
677 tty_setcolor (color);
678 for (i = dpy_text_column;
679 i < (off_t) col + charwidth && i < dpy_text_column + (off_t) r->cols; i++)
680 {
681 widget_gotoyx (view, r->y + row, r->x + (i - dpy_text_column));
682 tty_print_anychar ((cs[0] == '\t') ? ' ' : PARTIAL_CJK_AT_LEFT_MARGIN);
683 }
684 }
685 else if ((off_t) col < dpy_text_column + (off_t) r->cols
686 && (off_t) col + charwidth > dpy_text_column + (off_t) r->cols)
687 {
688 /* The combining character sequence would cross the right edge of the viewport
689 * and we're not wrapping. Print replacement character(s),
690 * or spaces with the correct attributes for partial Tabs. */
691 tty_setcolor (color);
692 for (i = col; i < dpy_text_column + (off_t) r->cols; i++)
693 {
694 widget_gotoyx (view, r->y + row, r->x + (i - dpy_text_column));
695 tty_print_anychar ((cs[0] == '\t') ? ' ' : PARTIAL_CJK_AT_RIGHT_MARGIN);
696 }
697 }
698 }
699
700 col += charwidth;
701 state->unwrapped_column += charwidth;
702
703 if (!view->mode_flags.wrap && (off_t) col >= dpy_text_column + (off_t) r->cols
704 && linewidth == NULL)
705 {
706 /* Optimization: Fast forward to the end of the line, rather than carefully
707 * parsing and then not actually displaying it. */
708 off_t eol;
709
710 eol = mcview_eol (view, state->offset);
711 mcview_state_machine_init (state, eol);
712 return 1;
713 }
714 }
715 }
716
717 /* --------------------------------------------------------------------------------------------- */
718 /**
719 * Parse, format and possibly display one paragraph (perhaps not from the beginning).
720 *
721 * Formatting starts at the given "state" (which encodes the file offset and parser and formatter's
722 * internal state). In unwrap mode, this should point to the beginning of the paragraph with the
723 * default state, the additional horizontal scrolling is added here. In wrap mode, this may point
724 * to the beginning of the line within a paragraph (to display the partial paragraph at the top),
725 * with the proper state at that point.
726 *
727 * Displaying the next paragraph should start at "state"'s new value, or if we displayed the bottom
728 * line then state->offset tells the file offset to be shown in the top bar.
729 *
730 * If "row" is negative, don't display the first abs(row) lines and display the rest from the top.
731 * This was a nice idea but it's now unused :)
732 *
733 * If "row" is too large, don't display the paragraph at all but still return the number of lines.
734 * This is used when moving upwards.
735 *
736 * @param view ...
737 * @param state the parser-formatter state machine's state, updated
738 * @param row print starting at this row
739 * @return the number of rows the paragraphs is wrapped to, that is, 0 if we were already at EOF,
740 * otherwise 1 in unwrap mode, >= 1 in wrap mode. We stop when reaching the bottom of the
741 * viewport, it's not counted how many more lines the paragraph would occupy
742 */
743 static int
744 mcview_display_paragraph (WView *view, mcview_state_machine_t *state, int row)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
745 {
746 int lines = 0;
747
748 while (TRUE)
749 {
750 gboolean paragraph_ended;
751
752 lines += mcview_display_line (view, state, row, ¶graph_ended, NULL);
753 if (paragraph_ended)
754 return lines;
755
756 if (row < view->data_area.lines)
757 {
758 row++;
759 // stop if bottom of screen reached
760 if (row >= view->data_area.lines)
761 return lines;
762 }
763 }
764 }
765
766 /* --------------------------------------------------------------------------------------------- */
767 /**
768 * Recompute dpy_state_top from dpy_start and dpy_paragraph_skip_lines. Clamp
769 * dpy_paragraph_skip_lines if necessary.
770 *
771 * This method should be called in wrap mode after changing one of the parsing or formatting
772 * properties (e.g. window width, encoding, nroff), or when switching to wrap mode from unwrap or
773 * hex.
774 *
775 * If we stayed within the same paragraph then try to keep the vertical offset within that
776 * paragraph as well. It might happen though that the paragraph became shorter than our desired
777 * vertical position, in that case move to its last row.
778 */
779 static void
780 mcview_wrap_fixup (WView *view)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
781 {
782 int lines = view->dpy_paragraph_skip_lines;
783
784 if (!view->dpy_wrap_dirty)
785 return;
786 view->dpy_wrap_dirty = FALSE;
787
788 view->dpy_paragraph_skip_lines = 0;
789 mcview_state_machine_init (&view->dpy_state_top, view->dpy_start);
790
791 while (lines-- != 0)
792 {
793 mcview_state_machine_t state_prev;
794 gboolean paragraph_ended;
795
796 state_prev = view->dpy_state_top;
797 if (mcview_display_line (view, &view->dpy_state_top, -1, ¶graph_ended, NULL) == 0)
798 break;
799 if (paragraph_ended)
800 {
801 view->dpy_state_top = state_prev;
802 break;
803 }
804 view->dpy_paragraph_skip_lines++;
805 }
806 }
807
808 /* --------------------------------------------------------------------------------------------- */
809 /*** public functions ****************************************************************************/
810 /* --------------------------------------------------------------------------------------------- */
811
812 /**
813 * In both wrap and unwrap modes, dpy_start points to the beginning of the paragraph.
814 *
815 * In unwrap mode, start displaying from this position, probably applying an additional horizontal
816 * scroll.
817 *
818 * In wrap mode, an additional dpy_paragraph_skip_lines lines are skipped from the top of this
819 * paragraph. dpy_state_top contains the position and parser-formatter state corresponding to the
820 * top left corner so we can just start rendering from here. Unless dpy_wrap_dirty is set in which
821 * case dpy_state_top is invalid and we need to recompute first.
822 */
823 void
824 mcview_display_text (WView *view)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
825 {
826 const WRect *r = &view->data_area;
827 int row;
828 mcview_state_machine_t state;
829 gboolean again;
830
831 do
832 {
833 int n;
834
835 again = FALSE;
836
837 mcview_display_clean (view);
838 mcview_display_ruler (view);
839
840 if (!view->mode_flags.wrap)
841 mcview_state_machine_init (&state, view->dpy_start);
842 else
843 {
844 mcview_wrap_fixup (view);
845 state = view->dpy_state_top;
846 }
847
848 for (row = 0; row < r->lines; row += n)
849 {
850 n = mcview_display_paragraph (view, &state, row);
851 if (n == 0)
852 {
853 /* In the rare case that displaying didn't start at the beginning
854 * of the file, yet there are some empty lines at the bottom,
855 * scroll the file and display again. This happens when e.g. the
856 * window is made bigger, or the file becomes shorter due to
857 * charset change or enabling nroff. */
858 if ((view->mode_flags.wrap ? view->dpy_state_top.offset : view->dpy_start) > 0)
859 {
860 mcview_ascii_move_up (view, r->lines - row);
861 again = TRUE;
862 }
863 break;
864 }
865 }
866 }
867 while (again);
868
869 view->dpy_end = state.offset;
870 view->dpy_state_bottom = state;
871
872 tty_setcolor (VIEWER_NORMAL_COLOR);
873 if (mcview_show_eof != NULL && mcview_show_eof[0] != '\0')
874 while (row < r->lines)
875 {
876 widget_gotoyx (view, r->y + row, r->x);
877 // TODO: should make it no wider than the viewport
878 tty_print_string (mcview_show_eof);
879 row++;
880 }
881 }
882
883 /* --------------------------------------------------------------------------------------------- */
884 /**
885 * Move down.
886 *
887 * It's very simple. Just invisibly format the next "lines" lines, carefully carrying the formatter
888 * state in wrap mode. But before each step we need to check if we've already hit the end of the
889 * file, in that case we can no longer move. This is done by walking from dpy_state_bottom.
890 *
891 * Note that this relies on mcview_display_text() setting dpy_state_bottom to its correct value
892 * upon rendering the screen contents. So don't call this function from other functions (e.g. at
893 * the bottom of mcview_ascii_move_up()) which invalidate this value.
894 */
895 void
896 mcview_ascii_move_down (WView *view, off_t lines)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
897 {
898 while (lines-- != 0)
899 {
900 gboolean paragraph_ended;
901
902 /* See if there's still data below the bottom line, by imaginarily displaying one
903 * more line. This takes care of reading more data into growbuf, if required.
904 * If the end position didn't advance, we're at EOF and hence bail out. */
905 if (mcview_display_line (view, &view->dpy_state_bottom, -1, ¶graph_ended, NULL) == 0)
906 break;
907
908 /* Okay, there's enough data. Move by 1 row at the top, too. No need to check for
909 * EOF, that can't happen. */
910 if (!view->mode_flags.wrap)
911 {
912 view->dpy_start = mcview_eol (view, view->dpy_start);
913 view->dpy_paragraph_skip_lines = 0;
914 view->dpy_wrap_dirty = TRUE;
915 }
916 else
917 {
918 mcview_display_line (view, &view->dpy_state_top, -1, ¶graph_ended, NULL);
919 if (!paragraph_ended)
920 view->dpy_paragraph_skip_lines++;
921 else
922 {
923 view->dpy_start = view->dpy_state_top.offset;
924 view->dpy_paragraph_skip_lines = 0;
925 }
926 }
927 }
928 }
929
930 /* --------------------------------------------------------------------------------------------- */
931 /**
932 * Move up.
933 *
934 * Unwrap mode: Piece of cake. Wrap mode: If we'd walk back more than the current line offset
935 * within the paragraph, we need to jump back to the previous paragraph and compute its height to
936 * see if we start from that paragraph, and repeat this if necessary. Once we're within the desired
937 * paragraph, we still need to format it from its beginning to know the state.
938 *
939 * See the top of this file for comments about MAX_BACKWARDS_WALK_IN_PARAGRAPH.
940 *
941 * force_max is a nice protection against the rare extreme case that the file underneath us
942 * changes, we don't want to endlessly consume a file of maybe full of zeros upon moving upwards.
943 */
944 void
945 mcview_ascii_move_up (WView *view, off_t lines)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
946 {
947 if (!view->mode_flags.wrap)
948 {
949 while (lines-- != 0)
950 view->dpy_start = mcview_bol (view, view->dpy_start - 1, 0);
951 view->dpy_paragraph_skip_lines = 0;
952 view->dpy_wrap_dirty = TRUE;
953 }
954 else
955 {
956 int i;
957
958 while (lines > view->dpy_paragraph_skip_lines)
959 {
960 // We need to go back to the previous paragraph.
961 if (view->dpy_start == 0)
962 {
963 // Oops, we're already in the first paragraph.
964 view->dpy_paragraph_skip_lines = 0;
965 mcview_state_machine_init (&view->dpy_state_top, 0);
966 return;
967 }
968 lines -= view->dpy_paragraph_skip_lines;
969 view->force_max = view->dpy_start;
970 view->dpy_start = mcview_bol (view, view->dpy_start - 1,
971 view->dpy_start - MAX_BACKWARDS_WALK_IN_PARAGRAPH);
972 mcview_state_machine_init (&view->dpy_state_top, view->dpy_start);
973 /* This is a tricky way of denoting that we're at the end of the paragraph.
974 * Normally we'd jump to the next paragraph and reset paragraph_skip_lines. But for
975 * walking backwards this is exactly what we need. */
976 view->dpy_paragraph_skip_lines =
977 mcview_display_paragraph (view, &view->dpy_state_top, view->data_area.lines);
978 view->force_max = -1;
979 }
980
981 /* Okay, we have have dpy_start pointing to the desired paragraph, and we still need to
982 * walk back "lines" lines from the current "dpy_paragraph_skip_lines" offset. We can't do
983 * that, so walk from the beginning of the paragraph. */
984 mcview_state_machine_init (&view->dpy_state_top, view->dpy_start);
985 view->dpy_paragraph_skip_lines -= lines;
986 for (i = 0; i < view->dpy_paragraph_skip_lines; i++)
987 mcview_display_line (view, &view->dpy_state_top, -1, NULL, NULL);
988 }
989 }
990
991 /* --------------------------------------------------------------------------------------------- */
992
993 void
994 mcview_ascii_moveto_bol (WView *view)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
995 {
996 if (!view->mode_flags.wrap)
997 view->dpy_text_column = 0;
998 }
999
1000 /* --------------------------------------------------------------------------------------------- */
1001
1002 void
1003 mcview_ascii_moveto_eol (WView *view)
/* ![[previous]](../icons/left.png)
![[next]](../icons/right.png)
![[first]](../icons/first.png)
![[last]](../icons/last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
1004 {
1005 if (!view->mode_flags.wrap)
1006 {
1007 mcview_state_machine_t state;
1008 off_t linewidth;
1009
1010 // Get the width of the topmost paragraph.
1011 mcview_state_machine_init (&state, view->dpy_start);
1012 mcview_display_line (view, &state, -1, NULL, &linewidth);
1013 view->dpy_text_column = DOZ (linewidth, (off_t) view->data_area.cols);
1014 }
1015 }
1016
1017 /* --------------------------------------------------------------------------------------------- */
1018
1019 void
1020 mcview_state_machine_init (mcview_state_machine_t *state, off_t offset)
/* ![[previous]](../icons/left.png)
![[next]](../icons/n_right.png)
![[first]](../icons/first.png)
![[last]](../icons/n_last.png)
![[top]](../icons/top.png)
![[bottom]](../icons/bottom.png)
![[index]](../icons/index.png)
*/
1021 {
1022 memset (state, 0, sizeof (*state));
1023 state->offset = offset;
1024 state->print_lonely_combining = TRUE;
1025 }
1026
1027 /* --------------------------------------------------------------------------------------------- */