I do think that the brain is specifically designed so that incoming sensory input will swamp internally generated images, and that you have to fight that.
For example, I think I saw a nicely researched book on stigmata: People with psychosomatic wounds, usually modelled on the supposed ones of Christ in this part of the world. There was an example where he tried suggesting that bleeding from the eyes was appropriate, and this appeared within about 10 hours. He seemed to consider this evidence that it was all driven by internal expectations rather than magic. It might also have been driven by fingernails, of course :) Judging by the psychic literature, one needs a Randi type guy involved to be sure. But there does seem reason to believe that interesting effects can be achieved, way beyond the norm, with appropriate training.
A hopeful part is that the brain clearly does routinely actively synthesize stuff even in sensory driven mode: the way it fills in the blind spot is one obvious example.
You probably remember me bouncing off Lanya the observation that rotating an image seems to help stabilize it. She promptly came back with an agreement... since she's apparently fraudulent, likely the first she'd heard of it :)
But I think that works because it activates one of the synthesis mechanisms: I still find it very difficult to understand how the brain can deal with images moving across the retina, but it seems to me that at _some_ level it has to involve basically implementing a shift operation on a sheet of cells: copying the information in one cell to an adjacent one, replicated all over a 2-D map somewhere. (If you can think of other possible mechanism, I'd be happy to hear of it; I can't think of anything less fantastic.)
Copying information being a classical computer trick for refreshing it, anything that forces the brain to copy an image from place to place is a candidate for strengthening it. I think one reason images always vanish just when you least want them to is that when you get interested, you instinctively freeze them, which kills the refresh cycle, making it fade. Rotating a 3-D object would presumably refresh some 2.5D buffer somewhere; panning back and forth in the scene might refresh the same buffer, or might refresh a separate earlier 2D buffer, or both, or I dunno.
There clearly seem to be some other synthesis mechanisms one can trigger: When I get a beautiful crystal-clear image out of the blue, it appears subjectively to be triggered by some particular stimulus, to get filled in rapidly but not instantaneously by some refresh mechanism which in my interpretation is intended to do interpolation, but is winding up synthesizing de novo in the absence of competing stimuli, and then it dies for lack of adequate refresh to keep it all alive, somehow.
So one approach is to identify the refresh mechanisms vs the input mechanisms, and look for systematic ways to amplify the refresh mechanisms while suppressing the input mechanisms. Sandy's comments and anecdotes both suggest that mechanical suppresion of input by itself -- a sensory deprivation tank -- is quite sufficient to do the trick. I don't know if anyone has tried extended practice of first learning to control hallucination in a tank, then gradually decreasing the sensory deprivation over time while practicing maintaining control, but that would be one obvious attack. Good luck getting funded :) The equipment is pretty cheap, however.
Have you put together what is known of the visual architecture into a coherent model of the logical buffers and pathways? I've seen lots of scraps, but I don't remember a synthesis of them. E.g., I remember SciAmerican a decade or so back (?) had an experiment demonstrating that the color pathways are merged quite late in the process, and demonstrating an after-image effect based on this which lasts weeks instead of the usual minutes. That sort of effortless introduction of long-term visual effects might be worth focussing on for practical reasons, too :). Given such a model of the architecture, you might be better set to look for internal image-copy operations between and within buffers that might be candidates to amplify and preserve an image.
An aspect that really puzzles me is the intensity of visual hallucinations. If I close my eyes and try consciously, I get pale stuff almost lost in the noise from the back of my eyelids, in general. But I've had spontaneous hallucinations of headlights shining in my eyes which seemed intense enough to be painful and just about trigger an eye-watering response... if that's not some sort of higher-level hallucination, it suggests that the potential is there to produce very strong signals, improving the chances of over-riding visual input.
Poul Anderson insists that every good descriptive scene should appeal to at least three senses. If the fundamental problem is that visualizations are competing for mental statespace with other, contradictory signals, internal and external, it may be that it is -easier- rather than harder to begin with multimedia controlled hallucinations: The different modalities may re-inforce each other in competing with the inconsistent external inputs.
So you might be better of imagining not just the image of a steel harp, say, but also the sound, the coolness of the metal, the vibration of the soundbox against your chest, the smell of the wood, the pressure on your fingertips, the somatic sense of arm position, and so forth.
It would be unintuitive by 20th-century reductionist scientific culture standards if the simplest controlled hallucination exercise consisted of a maximum amount of activity in a maximum number of sensory modalities :). But if this offers the maximum possibilities for copying information back and forth between various parts of the brain, amplifying it along the way, it might turn out to be the way to go.
(Aside: Notice how strong and long-lasting motion compensation effects seem to be to most other effects: "Getting your sea legs" seems to take forever, and once you have it, level land seems to rock under your feet for a long time. If you whirl in a chair awhile, you feel dizzy subsequently, and the visual world seems to track long, for a much longer interval than that of the typical auditory or visual hallucination. This might make sense if you accept that they are all adaptations based on institutionalizing regular internal copying of information, and that such copying is a powerful way of preventing the normal decay of internal states. Particularly impressive is the way that a steadily rotating 3-D wireframe on the screen seems very strongly to be moving the opposite direction after you stop it. There was an impressive SciAmerican article the usual "few years back" that demonstrated that the time it takes people to recognize two 3-D views of the same object as matching was directly proportional to the angle by which they have been rotated, strongly suggesting that the match is done by a smooth mental rotation, hence that smooth mental rotations are implemented. Would it be interesting to do quantitative experiments on the 3-D rotation speed after-effect? Ask people to turn a knob to get an object rotating so many degrees per second. Let them watch an object rotating clockwise sixty degrees per second for several minutes, then see if that effects the accuracy or speed with which they can set a speed of ten degrees per second counter clockwise. If so, you've found a quantitative way of measuring the rotation habituation after-effect, and might be able to learn something from the numbers...?
Another training regime I was considering using to approach this problem was working from the interpolation angle: The logic we're using to visualize seems to me primarily designed to do interpolation normally: I was going to take a shot at giving it incrementally more interpolation work to do. For example, the internal buffer seems designed in part to preserve our image of the world even while we blink, or during the interval between fixations when the eye saccades. (Question: How do you think the brain compensates for a saccade? Does everything in some buffer get copied sideways shift-register fashion as fast as the eye can move, and as often? That would seem remarkable. Can you think of a plausible but less remarkable mechanism? The above-cited results suggesting that we can and do implement smooth mental rotations can be interpreted as support for an internal shift-register type mechanism. How else would you implement smooth mental rotations of complex objects in neural hardware?) What happens if one systematically increases the need for interpolation? Can one condition the nervous system to accept more and more of the image being synthesized, by cutting back steadily on the available information? Would one eventually reach the point at which there is so little actual information entering into ones perception that it is in fact mostly hallucination anyhow, and controlled hallucination becomes trivial?
E.g., to increase the need for interpolation in time, one might use a computer screen, or wear LCD shutters, which obscure the view for steadily larger proportions of the time. If you wear LCD glasses set to be opaque for every other cycle at 30cps, do you adapt? What if you then start steadily decreasing the number of transparent cycles until maybe you're only getting two 1/30th of a second flashes per second? Do you wind up with a subjectively continuous visual field after a couple of weeks of habituation? Remember that about WWII, people tried wearing inverting prisms for weeks at a time and found that the visual system eventually compensated and left them with subjectively normal sight in some sense: Considerable adaptation is possible even in the adult.
Obviously, one could do similar interpolation in space experiments with a computer screen or appropriately programmed VR display, presuming you had one you could bear to wear for weeks on end: Start by dropping every other pixel in a checkerboard pattern. Does your eye compensate and give you a perceptually continuous field after a week or two? The example of the blind spot would suggest that it will; The example of people who lose half their visual cortex to a stroke but still wind up with a perceptually continuous world which merely happens to omit half the surrounding space would also suggest that it will. If so, how far can you go with the pixel-dropping process before the mechanism gives out? As the ratio of hallucination to actual visual input rises, do you find it easier to over-ride the remaining visual input and achieve controlled hallucination? Does it make any difference whether the dropped pixels are in a fixed pattern, or one which changes over time? If so, does it matter whether they shift in a smooth pattern over the visual field, or blink on and off randomly?
Interpolation in brightness might also be considered: If you experience a systematically dimmer or brighter visual world, does that influence the ability to override it? Brightness compensation is mostly done mechanically, however, and in any event very early in processing: I'd expect little from this particular line.
Given the generally strong effects of rotation, possibly related to the need to institutionalize copying operations, one might wonder about habituating to a world which is rotating in some fashion. If it were rotating left-to-right or top-to-bottom, the eye would normally compensate simply by tracking mechanically; unless this were somehow prevented, one would expect little in the way of interesting habituation from this. But what if the world were rotating around the Z axis like a pinwheel? The eye doesn't have muscles to rotate the eyeball around the visual axis, so compensation would have to be done neurologically. If you spend a few weeks viewing a world that rotates constantly this way, do you adapt? If so, does this invoke a strong habitual copying mechanism that makes after-images last systematically longer? Does it make voluntary visualations last longer too, piggybacking on the acquired image-copying habit?
If you want to make people nauseated and headachy, all this stuff should be one reasonable try *giggle*.